Sample records for throughput sequencing chip-seq

  1. Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.

    PubMed

    Dorsett, Dale; Misulovin, Ziva

    2017-01-01

    This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.

  2. The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.

    PubMed

    Perreault, Andrea A; Venters, Bryan J

    2016-12-23

    Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.

  3. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.

    PubMed

    Landt, Stephen G; Marinov, Georgi K; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E; Bickel, Peter; Brown, James B; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J; Hoffman, Michael M; Iyer, Vishwanath R; Jung, Youngsook L; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V; Li, Qunhua; Liu, Tao; Liu, X Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M; Park, Peter J; Pazin, Michael J; Perry, Marc D; Raha, Debasish; Reddy, Timothy E; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A; Tolstorukov, Michael Y; White, Kevin P; Xi, Simon; Farnham, Peggy J; Lieb, Jason D; Wold, Barbara J; Snyder, Michael

    2012-09-01

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

  4. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia

    PubMed Central

    Landt, Stephen G.; Marinov, Georgi K.; Kundaje, Anshul; Kheradpour, Pouya; Pauli, Florencia; Batzoglou, Serafim; Bernstein, Bradley E.; Bickel, Peter; Brown, James B.; Cayting, Philip; Chen, Yiwen; DeSalvo, Gilberto; Epstein, Charles; Fisher-Aylor, Katherine I.; Euskirchen, Ghia; Gerstein, Mark; Gertz, Jason; Hartemink, Alexander J.; Hoffman, Michael M.; Iyer, Vishwanath R.; Jung, Youngsook L.; Karmakar, Subhradip; Kellis, Manolis; Kharchenko, Peter V.; Li, Qunhua; Liu, Tao; Liu, X. Shirley; Ma, Lijia; Milosavljevic, Aleksandar; Myers, Richard M.; Park, Peter J.; Pazin, Michael J.; Perry, Marc D.; Raha, Debasish; Reddy, Timothy E.; Rozowsky, Joel; Shoresh, Noam; Sidow, Arend; Slattery, Matthew; Stamatoyannopoulos, John A.; Tolstorukov, Michael Y.; White, Kevin P.; Xi, Simon; Farnham, Peggy J.; Lieb, Jason D.; Wold, Barbara J.; Snyder, Michael

    2012-01-01

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals. PMID:22955991

  5. Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.

    PubMed

    Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew

    2016-10-10

    Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  6. Multiplexed ChIP-Seq Using Direct Nucleosome Barcoding: A Tool for High-Throughput Chromatin Analysis.

    PubMed

    Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent

    2018-01-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.

  7. Global Analysis of Transcription Factor-Binding Sites in Yeast Using ChIP-Seq

    PubMed Central

    Lefrançois, Philippe; Gallagher, Jennifer E. G.; Snyder, Michael

    2016-01-01

    Transcription factors influence gene expression through their ability to bind DNA at specific regulatory elements. Specific DNA-protein interactions can be isolated through the chromatin immunoprecipitation (ChIP) procedure, in which DNA fragments bound by the protein of interest are recovered. ChIP is followed by high-throughput DNA sequencing (Seq) to determine the genomic provenance of ChIP DNA fragments and their relative abundance in the sample. This chapter describes a ChIP-Seq strategy adapted for budding yeast to enable the genome-wide characterization of binding sites of transcription factors (TFs) and other DNA-binding proteins in an efficient and cost-effective way. Yeast strains with epitope-tagged TFs are most commonly used for ChIP-Seq, along with their matching untagged control strains. The initial step of ChIP involves the cross-linking of DNA and proteins. Next, yeast cells are lysed and sonicated to shear chromatin into smaller fragments. An antibody against an epitope-tagged TF is used to pull down chromatin complexes containing DNA and the TF of interest. DNA is then purified and proteins degraded. Specific barcoded adapters for multiplex DNA sequencing are ligated to ChIP DNA. Short DNA sequence reads (28–36 base pairs) are parsed according to the barcode and aligned against the yeast reference genome, thus generating a nucleotide-resolution map of transcription factor-binding sites and their occupancy. PMID:25213249

  8. ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108

  9. Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

    PubMed

    Wei, Yingying; Wu, George; Ji, Hongkai

    2013-05-01

    Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.

  10. Chromatin analyses of Zymoseptoria tritici: Methods for chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq).

    PubMed

    Soyer, Jessica L; Möller, Mareike; Schotanus, Klaas; Connolly, Lanelle R; Galazka, Jonathan M; Freitag, Michael; Stukenbrock, Eva H

    2015-06-01

    The presence or absence of specific transcription factors, chromatin remodeling machineries, chromatin modification enzymes, post-translational histone modifications and histone variants all play crucial roles in the regulation of pathogenicity genes. Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) provides an important tool to study genome-wide protein-DNA interactions to help understand gene regulation in the context of native chromatin. ChIP-seq is a convenient in vivo technique to identify, map and characterize occupancy of specific DNA fragments with proteins against which specific antibodies exist or which can be epitope-tagged in vivo. We optimized existing ChIP protocols for use in the wheat pathogen Zymoseptoria tritici and closely related sister species. Here, we provide a detailed method, underscoring which aspects of the technique are organism-specific. Library preparation for Illumina sequencing is described, as this is currently the most widely used ChIP-seq method. One approach for the analysis and visualization of representative sequence is described; improved tools for these analyses are constantly being developed. Using ChIP-seq with antibodies against H3K4me2, which is considered a mark for euchromatin or H3K9me3 and H3K27me3, which are considered marks for heterochromatin, the overall distribution of euchromatin and heterochromatin in the genome of Z. tritici can be determined. Our ChIP-seq protocol was also successfully applied to Z. tritici strains with high levels of melanization or aberrant colony morphology, and to different species of the genus (Z. ardabiliae and Z. pseudotritici), suggesting that our technique is robust. The methods described here provide a powerful framework to study new aspects of chromatin biology and gene regulation in this prominent wheat pathogen. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. Purification of nanogram-range immunoprecipitated DNA in ChIP-seq application.

    PubMed

    Zhong, Jian; Ye, Zhenqing; Lenz, Samuel W; Clark, Chad R; Bharucha, Adil; Farrugia, Gianrico; Robertson, Keith D; Zhang, Zhiguo; Ordog, Tamas; Lee, Jeong-Heon

    2017-12-21

    Chromatin immunoprecipitation-sequencing (ChIP-seq) is a widely used epigenetic approach for investigating genome-wide protein-DNA interactions in cells and tissues. The approach has been relatively well established but several key steps still require further improvement. As a part of the procedure, immnoprecipitated DNA must undergo purification and library preparation for subsequent high-throughput sequencing. Current ChIP protocols typically yield nanogram quantities of immunoprecipitated DNA mainly depending on the target of interest and starting chromatin input amount. However, little information exists on the performance of reagents used for the purification of such minute amounts of immunoprecipitated DNA in ChIP elution buffer and their effects on ChIP-seq data. Here, we compared DNA recovery, library preparation efficiency, and ChIP-seq results obtained with several commercial DNA purification reagents applied to 1 ng ChIP DNA and also investigated the impact of conditions under which ChIP DNA is stored. We compared DNA recovery of ten commercial DNA purification reagents and phenol/chloroform extraction from 1 to 50 ng of immunopreciptated DNA in ChIP elution buffer. The recovery yield was significantly different with 1 ng of DNA while similar in higher DNA amounts. We also observed that the low nanogram range of purified DNA is prone to loss during storage depending on the type of polypropylene tube used. The immunoprecipitated DNA equivalent to 1 ng of purified DNA was subject to DNA purification and library preparation to evaluate the performance of four better performing purification reagents in ChIP-seq applications. Quantification of library DNAs indicated the selected purification kits have a negligible impact on the efficiency of library preparation. The resulting ChIP-seq data were comparable with the dataset generated by ENCODE consortium and were highly correlated between the data from different purification reagents. This study provides comparative data on commercial DNA purification reagents applied to nanogram-range immunopreciptated ChIP DNA and evidence for the importance of storage conditions of low nanogram-range purified DNA. We verified consistent high performance of a subset of the tested reagents. These results will facilitate the improvement of ChIP-seq methodology for low-input applications.

  12. ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data

    PubMed Central

    2010-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome. Results We have developed ChIPpeakAnno as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with ChIPpeakAnno can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes. Conclusions ChIPpeakAnno enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as GenomicFeatures and BSgenome, provides flexibility. Tight integration to the biomaRt package enables up-to-date annotation retrieval from the BioMart database. PMID:20459804

  13. Accurate Prediction of Inducible Transcription Factor Binding Intensities In Vivo

    PubMed Central

    Siepel, Adam; Lis, John T.

    2012-01-01

    DNA sequence and local chromatin landscape act jointly to determine transcription factor (TF) binding intensity profiles. To disentangle these influences, we developed an experimental approach, called protein/DNA binding followed by high-throughput sequencing (PB–seq), that allows the binding energy landscape to be characterized genome-wide in the absence of chromatin. We applied our methods to the Drosophila Heat Shock Factor (HSF), which inducibly binds a target DNA sequence element (HSE) following heat shock stress. PB–seq involves incubating sheared naked genomic DNA with recombinant HSF, partitioning the HSF–bound and HSF–free DNA, and then detecting HSF–bound DNA by high-throughput sequencing. We compared PB–seq binding profiles with ones observed in vivo by ChIP–seq and developed statistical models to predict the observed departures from idealized binding patterns based on covariates describing the local chromatin environment. We found that DNase I hypersensitivity and tetra-acetylation of H4 were the most influential covariates in predicting changes in HSF binding affinity. We also investigated the extent to which DNA accessibility, as measured by digital DNase I footprinting data, could be predicted from MNase–seq data and the ChIP–chip profiles for many histone modifications and TFs, and found GAGA element associated factor (GAF), tetra-acetylation of H4, and H4K16 acetylation to be the most predictive covariates. Lastly, we generated an unbiased model of HSF binding sequences, which revealed distinct biophysical properties of the HSF/HSE interaction and a previously unrecognized substructure within the HSE. These findings provide new insights into the interplay between the genomic sequence and the chromatin landscape in determining transcription factor binding intensity. PMID:22479205

  14. ChIP-seq.

    PubMed

    Kim, Tae Hoon; Dekker, Job

    2018-05-01

    Owing to its digital nature, ChIP-seq has become the standard method for genome-wide ChIP analysis. Using next-generation sequencing platforms (notably the Illumina Genome Analyzer), millions of short sequence reads can be obtained. The densities of recovered ChIP sequence reads along the genome are used to determine the binding sites of the protein. Although a relatively small amount of ChIP DNA is required for ChIP-seq, the current sequencing platforms still require amplification of the ChIP DNA by ligation-mediated PCR (LM-PCR). This protocol, which involves linker ligation followed by size selection, is the standard ChIP-seq protocol using an Illumina Genome Analyzer. The size-selected ChIP DNA is amplified by LM-PCR and size-selected for the second time. The purified ChIP DNA is then loaded into the Genome Analyzer. The ChIP DNA can also be processed in parallel for ChIP-chip results. © 2018 Cold Spring Harbor Laboratory Press.

  15. Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq

    PubMed Central

    Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim

    2014-01-01

    The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.

  16. Limitations and possibilities of low cell number ChIP-seq

    PubMed Central

    2012-01-01

    Background Chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq) offers high resolution, genome-wide analysis of DNA-protein interactions. However, current standard methods require abundant starting material in the range of 1–20 million cells per immunoprecipitation, and remain a bottleneck to the acquisition of biologically relevant epigenetic data. Using a ChIP-seq protocol optimised for low cell numbers (down to 100,000 cells / IP), we examined the performance of the ChIP-seq technique on a series of decreasing cell numbers. Results We present an enhanced native ChIP-seq method tailored to low cell numbers that represents a 200-fold reduction in input requirements over existing protocols. The protocol was tested over a range of starting cell numbers covering three orders of magnitude, enabling determination of the lower limit of the technique. At low input cell numbers, increased levels of unmapped and duplicate reads reduce the number of unique reads generated, and can drive up sequencing costs and affect sensitivity if ChIP is attempted from too few cells. Conclusions The optimised method presented here considerably reduces the input requirements for performing native ChIP-seq. It extends the applicability of the technique to isolated primary cells and rare cell populations (e.g. biobank samples, stem cells), and in many cases will alleviate the need for cell culture and any associated alteration of epigenetic marks. However, this study highlights a challenge inherent to ChIP-seq from low cell numbers: as cell input numbers fall, levels of unmapped sequence reads and PCR-generated duplicate reads rise. We discuss a number of solutions to overcome the effects of reducing cell number that may aid further improvements to ChIP performance. PMID:23171294

  17. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  18. A comparative study of ChIP-seq sequencing library preparation methods.

    PubMed

    Sundaram, Arvind Y M; Hughes, Timothy; Biondi, Shea; Bolduc, Nathalie; Bowman, Sarah K; Camilli, Andrew; Chew, Yap C; Couture, Catherine; Farmer, Andrew; Jerome, John P; Lazinski, David W; McUsic, Andrew; Peng, Xu; Shazand, Kamran; Xu, Feng; Lyle, Robert; Gilfillan, Gregor D

    2016-10-21

    ChIP-seq is the primary technique used to investigate genome-wide protein-DNA interactions. As part of this procedure, immunoprecipitated DNA must undergo "library preparation" to enable subsequent high-throughput sequencing. To facilitate the analysis of biopsy samples and rare cell populations, there has been a recent proliferation of methods allowing sequencing library preparation from low-input DNA amounts. However, little information exists on the relative merits, performance, comparability and biases inherent to these procedures. Notably, recently developed single-cell ChIP procedures employing microfluidics must also employ library preparation reagents to allow downstream sequencing. In this study, seven methods designed for low-input DNA/ChIP-seq sample preparation (Accel-NGS® 2S, Bowman-method, HTML-PCR, SeqPlex™, DNA SMART™, TELP and ThruPLEX®) were performed on five replicates of 1 ng and 0.1 ng input H3K4me3 ChIP material, and compared to a "gold standard" reference PCR-free dataset. The performance of each method was examined for the prevalence of unmappable reads, amplification-derived duplicate reads, reproducibility, and for the sensitivity and specificity of peak calling. We identified consistent high performance in a subset of the tested reagents, which should aid researchers in choosing the most appropriate reagents for their studies. Furthermore, we expect this work to drive future advances by identifying and encouraging use of the most promising methods and reagents. The results may also aid judgements on how comparable are existing datasets that have been prepared with different sample library preparation reagents.

  19. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    PubMed

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  20. Development of a High-Throughput Resequencing Array for the Detection of Pathogenic Mutations in Osteogenesis Imperfecta

    PubMed Central

    Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang

    2015-01-01

    Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658

  1. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  2. Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.

    PubMed

    Chung, Dongjun; Kuan, Pei Fen; Li, Bo; Sanalkumar, Rajendran; Liang, Kun; Bresnick, Emery H; Dewey, Colin; Keleş, Sündüz

    2011-07-01

    Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is rapidly replacing chromatin immunoprecipitation combined with genome-wide tiling array analysis (ChIP-chip) as the preferred approach for mapping transcription-factor binding sites and chromatin modifications. The state of the art for analyzing ChIP-seq data relies on using only reads that map uniquely to a relevant reference genome (uni-reads). This can lead to the omission of up to 30% of alignable reads. We describe a general approach for utilizing reads that map to multiple locations on the reference genome (multi-reads). Our approach is based on allocating multi-reads as fractional counts using a weighted alignment scheme. Using human STAT1 and mouse GATA1 ChIP-seq datasets, we illustrate that incorporation of multi-reads significantly increases sequencing depths, leads to detection of novel peaks that are not otherwise identifiable with uni-reads, and improves detection of peaks in mappable regions. We investigate various genome-wide characteristics of peaks detected only by utilization of multi-reads via computational experiments. Overall, peaks from multi-read analysis have similar characteristics to peaks that are identified by uni-reads except that the majority of them reside in segmental duplications. We further validate a number of GATA1 multi-read only peaks by independent quantitative real-time ChIP analysis and identify novel target genes of GATA1. These computational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-seq experiments.

  3. Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.

    PubMed

    Raghav, Sunil Kumar; Deplancke, Bart

    2012-01-01

    Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.

  4. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs

    PubMed Central

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore

    2017-01-01

    Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659

  5. Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.

    PubMed

    Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N

    2018-04-17

    A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.

  6. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  7. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  8. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  9. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    PubMed

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei

    2017-05-19

    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Analysis of ChIP-seq Data in R/Bioconductor.

    PubMed

    de Santiago, Ines; Carroll, Thomas

    2018-01-01

    The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.

  11. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  12. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  13. HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

    PubMed

    David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

  14. HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

    PubMed Central

    David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

    2014-01-01

    The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057

  15. Characteristics of microbial community functional structure of a biological coking wastewater treatment system.

    PubMed

    Joshi, Dev Raj; Zhang, Yu; Zhang, Hong; Gao, Yingxin; Yang, Min

    2018-01-01

    Nitrogenous heterocyclic compounds are key pollutants in coking wastewater; however, the functional potential of microbial communities for biodegradation of such contaminants during biological treatment is still elusive. Herein, a high throughput functional gene array (GeoChip 5.0) in combination with Illumina HiSeq2500 sequencing was used to compare and characterize the microbial community functional structure in a long run (500days) bench scale bioreactor treating coking wastewater, with a control system treating synthetic wastewater. Despite the inhibitory toxic pollutants, GeoChip 5.0 detected almost all key functional gene (average 61,940 genes) categories in the coking wastewater sludge. With higher abundance, aromatic ring cleavage dioxygenase genes including multi ring1,2diox; one ring2,3diox; catechol represented significant functional potential for degradation of aromatic pollutants which was further confirmed by Illumina HiSeq2500 analysis results. Response ratio analysis revealed that three nitrogenous compound degrading genes- nbzA (nitro-aromatics), tdnB (aniline), and scnABC (thiocyanate) were unique for coking wastewater treatment, which might be strong cause to increase ammonia level during the aerobic process. Additionally, HiSeq2500 elucidated carbozole and isoquinoline degradation genes in the system. These findings expanded our understanding on functional potential of microbial communities to remove organic nitrogenous pollutants; hence it will be useful in optimization strategies for biological treatment of coking wastewater. Copyright © 2017. Published by Elsevier B.V.

  16. Systems biology of cancer biomarker detection.

    PubMed

    Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas

    2013-01-01

    Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.

  17. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments

    PubMed Central

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D.; Kluger, Yuval

    2012-01-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development. PMID:22307239

  18. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

    PubMed

    Micsinai, Mariann; Parisi, Fabio; Strino, Francesco; Asp, Patrik; Dynlacht, Brian D; Kluger, Yuval

    2012-05-01

    Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.

  19. Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data.

    PubMed

    Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha

    2016-11-01

    Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  20. High-throughput full-length single-cell mRNA-seq of rare cells.

    PubMed

    Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X

    2017-01-01

    Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.

  1. The promise and challenge of high-throughput sequencing of the antibody repertoire

    PubMed Central

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  2. High-throughput illumina strand-specific RNA sequencing library preparation

    USDA-ARS?s Scientific Manuscript database

    Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...

  3. SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read

    PubMed Central

    2010-01-01

    Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148

  4. CRISPR-Cas9-Edited Site Sequencing (CRES-Seq): An Efficient and High-Throughput Method for the Selection of CRISPR-Cas9-Edited Clones.

    PubMed

    Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel

    2018-01-16

    The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  5. A fast one-chip event-preprocessor and sequencer for the Simbol-X Low Energy Detector

    NASA Astrophysics Data System (ADS)

    Schanz, T.; Tenzer, C.; Maier, D.; Kendziorra, E.; Santangelo, A.

    2010-12-01

    We present an FPGA-based digital camera electronics consisting of an Event-Preprocessor (EPP) for on-board data preprocessing and a related Sequencer (SEQ) to generate the necessary signals to control the readout of the detector. The device has been originally designed for the Simbol-X low energy detector (LED). The EPP operates on 64×64 pixel images and has a real-time processing capability of more than 8000 frames per second. The already working releases of the EPP and the SEQ are now combined into one Digital-Camera-Controller-Chip (D3C).

  6. Simultaneous identification and molecular characterization of viruses associated with an apple tree with mosaic symptom

    USDA-ARS?s Scientific Manuscript database

    We conducted genomic sequencing to identify viruses associated with mosaic disease of an apple tree using the high-throughput sequencing (HTS) Illumina RNA-seq platform. The objective was to examine if rapid identification and characterization of viruses could be effectively achieved by RNA-seq anal...

  7. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    PubMed Central

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  8. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    PubMed

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  9. ChIP-seq: advantages and challenges of a maturing technology.

    PubMed

    Park, Peter J

    2009-10-01

    Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.

  10. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

    PubMed Central

    Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior

    2012-01-01

    Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036

  11. Library construction for next-generation sequencing: Overviews and challenges

    PubMed Central

    Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip

    2014-01-01

    High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796

  12. Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.

    PubMed

    Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart

    2014-01-15

    High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter

  13. Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daum, Christopher; Zane, Matthew; Han, James

    2011-01-31

    The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less

  14. Introduction to Single-Cell RNA Sequencing.

    PubMed

    Olsen, Thale Kristin; Baryawno, Ninib

    2018-04-01

    During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  15. oPOSSUM-3: Advanced Analysis of Regulatory Motif Over-Representation Across Genes or ChIP-Seq Datasets

    PubMed Central

    Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.

    2012-01-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536

  16. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets.

    PubMed

    Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W

    2012-09-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.

  17. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing

    PubMed Central

    Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro

    2017-01-01

    Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100

  18. A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

    PubMed

    Bansal, Vikas

    2017-03-14

    PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .

  19. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.

    PubMed

    Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K

    2017-04-01

    Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.

  20. Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.

    PubMed

    Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh

    2018-01-01

    Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.

  1. Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

    PubMed Central

    Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

    2013-01-01

    Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798

  2. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    PubMed Central

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484

  3. S-MART, a software toolbox to aid RNA-Seq data analysis.

    PubMed

    Zytnicki, Matthias; Quesneville, Hadi

    2011-01-01

    High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.

  4. S-MART, A Software Toolbox to Aid RNA-seq Data Analysis

    PubMed Central

    Zytnicki, Matthias; Quesneville, Hadi

    2011-01-01

    High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740

  5. GenomicTools: a computational platform for developing high-throughput analytics in genomics.

    PubMed

    Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo

    2012-01-15

    Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.

  6. Chromatin Immunoprecipitation (ChIP) Protocol for Low-abundance Embryonic Samples.

    PubMed

    Rehimi, Rizwan; Bartusel, Michaela; Solinas, Francesca; Altmüller, Janine; Rada-Iglesias, Alvaro

    2017-08-29

    Chromatin immunoprecipitation (ChIP) is a widely-used technique for mapping the localization of post-translationally modified histones, histone variants, transcription factors, or chromatin-modifying enzymes at a given locus or on a genome-wide scale. The combination of ChIP assays with next-generation sequencing (i.e., ChIP-Seq) is a powerful approach to globally uncover gene regulatory networks and to improve the functional annotation of genomes, especially of non-coding regulatory sequences. ChIP protocols normally require large amounts of cellular material, thus precluding the applicability of this method to investigating rare cell types or small tissue biopsies. In order to make the ChIP assay compatible with the amount of biological material that can typically be obtained in vivo during early vertebrate embryogenesis, we describe here a simplified ChIP protocol in which the number of steps required to complete the assay were reduced to minimize sample loss. This ChIP protocol has been successfully used to investigate different histone modifications in various embryonic chicken and adult mouse tissues using low to medium cell numbers (5 x 10 4 - 5 x 10 5 cells). Importantly, this protocol is compatible with ChIP-seq technology using standard library preparation methods, thus providing global epigenomic maps in highly relevant embryonic tissues.

  7. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.

    PubMed

    Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji

    2017-02-28

    High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Genome-wide mapping of DNase I hypersensitive sites in rare cell populations using single-cell DNase sequencing.

    PubMed

    Cooper, James; Ding, Yi; Song, Jiuzhou; Zhao, Keji

    2017-11-01

    Increased chromatin accessibility is a feature of cell-type-specific cis-regulatory elements; therefore, mapping of DNase I hypersensitive sites (DHSs) enables the detection of active regulatory elements of transcription, including promoters, enhancers, insulators and locus-control regions. Single-cell DNase sequencing (scDNase-seq) is a method of detecting genome-wide DHSs when starting with either single cells or <1,000 cells from primary cell sources. This technique enables genome-wide mapping of hypersensitive sites in a wide range of cell populations that cannot be analyzed using conventional DNase I sequencing because of the requirement for millions of starting cells. Fresh cells, formaldehyde-cross-linked cells or cells recovered from formalin-fixed paraffin-embedded (FFPE) tissue slides are suitable for scDNase-seq assays. To generate scDNase-seq libraries, cells are lysed and then digested with DNase I. Circular carrier plasmid DNA is included during subsequent DNA purification and library preparation steps to prevent loss of the small quantity of DHS DNA. Libraries are generated for high-throughput sequencing on the Illumina platform using standard methods. Preparation of scDNase-seq libraries requires only 2 d. The materials and molecular biology techniques described in this protocol should be accessible to any general molecular biology laboratory. Processing of high-throughput sequencing data requires basic bioinformatics skills and uses publicly available bioinformatics software.

  9. Evaluation of sequencing approaches for high-throughput ...

    EPA Pesticide Factsheets

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i

  10. Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data

    PubMed Central

    Carroll, Thomas S.; Liang, Ziwei; Salama, Rafik; Stark, Rory; de Santiago, Ines

    2014-01-01

    With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency. PMID:24782889

  11. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  12. RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes

    PubMed Central

    Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.

    2013-01-01

    Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052

  13. Genome-wide analysis of replication timing by next-generation sequencing with E/L Repli-seq.

    PubMed

    Marchal, Claire; Sasaki, Takayo; Vera, Daniel; Wilson, Korey; Sima, Jiao; Rivera-Mulia, Juan Carlos; Trevilla-García, Claudia; Nogues, Coralin; Nafie, Ebtesam; Gilbert, David M

    2018-05-01

    This protocol is an extension to: Nat. Protoc. 6, 870-895 (2014); doi:10.1038/nprot.2011.328; published online 02 June 2011Cycling cells duplicate their DNA content during S phase, following a defined program called replication timing (RT). Early- and late-replicating regions differ in terms of mutation rates, transcriptional activity, chromatin marks and subnuclear position. Moreover, RT is regulated during development and is altered in diseases. Here, we describe E/L Repli-seq, an extension of our Repli-chip protocol. E/L Repli-seq is a rapid, robust and relatively inexpensive protocol for analyzing RT by next-generation sequencing (NGS), allowing genome-wide assessment of how cellular processes are linked to RT. Briefly, cells are pulse-labeled with BrdU, and early and late S-phase fractions are sorted by flow cytometry. Labeled nascent DNA is immunoprecipitated from both fractions and sequenced. Data processing leads to a single bedGraph file containing the ratio of nascent DNA from early versus late S-phase fractions. The results are comparable to those of Repli-chip, with the additional benefits of genome-wide sequence information and an increased dynamic range. We also provide computational pipelines for downstream analyses, for parsing phased genomes using single-nucleotide polymorphisms (SNPs) to analyze RT allelic asynchrony, and for direct comparison to Repli-chip data. This protocol can be performed in up to 3 d before sequencing, and requires basic cellular and molecular biology skills, as well as a basic understanding of Unix and R.

  14. How B-Cell Receptor Repertoire Sequencing Can Be Enriched with Structural Antibody Data

    PubMed Central

    Kovaltsuk, Aleksandr; Krawczyk, Konrad; Galson, Jacob D.; Kelly, Dominic F.; Deane, Charlotte M.; Trück, Johannes

    2017-01-01

    Next-generation sequencing of immunoglobulin gene repertoires (Ig-seq) allows the investigation of large-scale antibody dynamics at a sequence level. However, structural information, a crucial descriptor of antibody binding capability, is not collected in Ig-seq protocols. Developing systematic relationships between the antibody sequence information gathered from Ig-seq and low-throughput techniques such as X-ray crystallography could radically improve our understanding of antibodies. The mapping of Ig-seq datasets to known antibody structures can indicate structurally, and perhaps functionally, uncharted areas. Furthermore, contrasting naïve and antigenically challenged datasets using structural antibody descriptors should provide insights into antibody maturation. As the number of antibody structures steadily increases and more and more Ig-seq datasets become available, the opportunities that arise from combining the two types of information increase as well. Here, we review how these data types enrich one another and show potential for advancing our knowledge of the immune system and improving antibody engineering. PMID:29276518

  15. Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease

    DTIC Science & Technology

    2016-10-01

    deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3

  16. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  17. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.

    PubMed

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-12-15

    High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.

  18. Chromatin immunoprecipitation with fixed animal tissues and preparation for high-throughput sequencing.

    PubMed

    Cotney, Justin L; Noonan, James P

    2015-02-02

    Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) is a powerful method used to identify genome-wide binding patterns of transcription factors and distribution of various histone modifications associated with different chromatin states. In most published studies, ChIP-Seq has been performed on cultured cells grown under controlled conditions, allowing generation of large amounts of material in a homogeneous biological state. Although such studies have provided great insight into the dynamic landscapes of animal genomes, they do not allow the examination of transcription factor binding and chromatin states in adult tissues, developing embryonic structures, or tumors. Such knowledge is critical to understanding the information required to create and maintain a complex biological tissue and to identify noncoding regions of the genome directly involved in tissues affected by complex diseases such as autism. Studying these tissue types with ChIP-Seq can be challenging due to the limited availability of tissues and the lack of complex biological states able to be achieved in culture. These inherent differences require alterations of standard cross-linking and chromatin extraction typically used in cell culture. Here we describe a general approach for using small amounts of animal tissue to perform ChIP-Seq directed at histone modifications and transcription factors. Tissue is homogenized before treatment with formaldehyde to ensure proper cross-linking, and a two-step nuclear isolation is performed to increase extraction of soluble chromatin. Small amounts of soluble chromatin are then used for immunoprecipitation (IP) and prepared for multiplexed high-throughput sequencing. © 2015 Cold Spring Harbor Laboratory Press.

  19. Methods for processing high-throughput RNA sequencing data.

    PubMed

    Ares, Manuel

    2014-11-03

    High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner. © 2014 Cold Spring Harbor Laboratory Press.

  20. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

    PubMed

    Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho

    2015-10-28

    Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.

  1. Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.

    PubMed

    Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua

    2018-03-01

    More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.

  2. RNA-Seq Technology and Its Application in Fish Transcriptomics

    PubMed Central

    Ba, Yi; Zhuang, Qianfeng

    2014-01-01

    Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445

  3. Protein Interaction Profile Sequencing (PIP-seq).

    PubMed

    Foley, Shawn W; Gregory, Brian D

    2016-10-10

    Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  4. High-Throughput Single-Cell RNA Sequencing and Data Analysis.

    PubMed

    Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic

    2018-01-01

    Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.

  5. Comparison of the performance of Ion Torrent chips in noninvasive prenatal trisomy detection.

    PubMed

    Wang, Yanlin; Wen, Zujia; Shen, Jiawei; Cheng, Weiwei; Li, Jun; Qin, Xiaolan; Ma, Duan; Shi, Yongyong

    2014-07-01

    Semiconductor high-throughput sequencing, represented by Ion Torrent PGM/Proton, proves to be feasible in the noninvasive prenatal diagnosis of fetal aneuploidies. It is commendable that, with less data and relevant cost also, an accurate result can be achieved owing to the high sensitivity and specificity of such kind of technology. We conducted a comparative analysis of the performance of four different Ion chips in detecting fetal chromosomal aneuploidies. Eight maternal plasma DNA samples, including four pregnancies with normal fetuses and four with trisomy 21 fetuses, were sequenced on Ion Torrent 314/316/318/PI chips, respectively. Results such as read mapped ratio, correlation coefficient and phred quality score were calculated and parallelly compared. All samples were correctly classified even with low-throughput chip, and, among the four chips, the 316 chip had the highest read mapped ratio, correlation coefficient, mean read length and phred quality score. All chips were well consistent with each other. Our results showed that all Ion chips are applicable in noninvasive prenatal fetal aneuploidy diagnosis. We recommend researchers or clinicians to use the appropriate chip with barcoding technology on the basis of the sample number.

  6. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  7. Gene expression profiling of human breast tissue samples using SAGE-Seq.

    PubMed

    Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley

    2010-12-01

    We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.

  8. YeATS- a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

    USDA-ARS?s Scientific Manuscript database

    The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves exist...

  9. CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.

    PubMed

    Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E

    2018-05-03

    Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    PubMed

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  11. Analysis of Protein-DNA Interaction by Chromatin Immunoprecipitation and DNA Tiling Microarray (ChIP-on-chip).

    PubMed

    Gao, Hui; Zhao, Chunyan

    2018-01-01

    Chromatin immunoprecipitation (ChIP) has become the most effective and widely used tool to study the interactions between specific proteins or modified forms of proteins and a genomic DNA region. Combined with genome-wide profiling technologies, such as microarray hybridization (ChIP-on-chip) or massively parallel sequencing (ChIP-seq), ChIP could provide a genome-wide mapping of in vivo protein-DNA interactions in various organisms. Here, we describe a protocol of ChIP-on-chip that uses tiling microarray to obtain a genome-wide profiling of ChIPed DNA.

  12. Progress in ion torrent semiconductor chip based sequencing.

    PubMed

    Merriman, Barry; Rothberg, Jonathan M

    2012-12-01

    In order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal-oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. Uniform, optimal signal processing of mapped deep-sequencing data.

    PubMed

    Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam

    2013-07-01

    Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.

  14. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    PubMed Central

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  15. Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data

    PubMed Central

    Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo

    2011-01-01

    Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein–DNA and protein–RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Availability: Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy Contact: eduardo.eyras@upf.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21994224

  16. Transcription profile of boar spermatozoa as revealed by RNA-sequencing

    USDA-ARS?s Scientific Manuscript database

    High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...

  17. TSSAR: TSS annotation regime for dRNA-seq data.

    PubMed

    Amman, Fabian; Wolfinger, Michael T; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F; Findeiß, Sven

    2014-03-27

    Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.

  18. Advances in single-cell RNA sequencing and its applications in cancer research.

    PubMed

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-08-08

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.

  19. Advances in single-cell RNA sequencing and its applications in cancer research

    PubMed Central

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-01-01

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849

  20. Bimodal imprint chips for peptide screening: integration of high-throughput sequencing by MS and affinity analyses by surface plasmon resonance imaging.

    PubMed

    Wang, Weizhi; Li, Menglin; Wei, Zewen; Wang, Zihua; Bu, Xiangli; Lai, Wenjia; Yang, Shu; Gong, He; Zheng, Hui; Wang, Yuqiao; Liu, Ying; Li, Qin; Fang, Qiaojun; Hu, Zhiyuan

    2014-04-15

    Peptide probes and drugs have widespread applications in disease diagnostics and therapy. The demand for peptides ligands with high affinity and high specificity toward various targets has surged in the biomedical field in recent years. The traditional peptide screening procedure involves selection, sequencing, and characterization steps, and each step is manual and tedious. Herein, we developed a bimodal imprint microarray system to embrace the whole peptide screening process. Silver-sputtered silicon chip fabricated with microwell array can trap and pattern the candidate peptide beads in a one-well-one-bead manner. Peptides on beads were photocleaved in situ. A portion of the peptide in each well was transferred to a gold-coated chip to print the peptide array for high-throughput affinity analyses by surface plasmon resonance imaging (SPRi), and the peptide left in the silver-sputtered chip was ready for in situ single bead sequencing by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS). Using the bimodal imprint chip system, affinity peptides toward AHA were efficiently screened out from the 7 × 10(4) peptide library. The method provides a solution for high efficiency peptide screening.

  1. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  2. A quantitative and qualitative comparison of illumina MiSeq and 454 amplicon sequencing for genotyping the highly polymorphic major histocompatibility complex (MHC) in a non-model species.

    PubMed

    Razali, Haslina; O'Connor, Emily; Drews, Anna; Burke, Terry; Westerdahl, Helena

    2017-07-28

    High-throughput sequencing enables high-resolution genotyping of extremely duplicated genes. 454 amplicon sequencing (454) has become the standard technique for genotyping the major histocompatibility complex (MHC) genes in non-model organisms. However, illumina MiSeq amplicon sequencing (MiSeq), which offers a much higher read depth, is now superseding 454. The aim of this study was to quantitatively and qualitatively evaluate the performance of MiSeq in relation to 454 for genotyping MHC class I alleles using a house sparrow (Passer domesticus) dataset with pedigree information. House sparrows provide a good study system for this comparison as their MHC class I genes have been studied previously and, consequently, we had prior expectations concerning the number of alleles per individual. We found that 454 and MiSeq performed equally well in genotyping amplicons with low diversity, i.e. amplicons from individuals that had fewer than 6 alleles. Although there was a higher rate of failure in the 454 dataset in resolving amplicons with higher diversity (6-9 alleles), the same genotypes were identified by both 454 and MiSeq in 98% of cases. We conclude that low diversity amplicons are equally well genotyped using either 454 or MiSeq, but the higher coverage afforded by MiSeq can lead to this approach outperforming 454 in amplicons with higher diversity.

  3. iTAR: a web server for identifying target genes of transcription factors using ChIP-seq or ChIP-chip data.

    PubMed

    Yang, Chia-Chun; Andrews, Erik H; Chen, Min-Hsuan; Wang, Wan-Yu; Chen, Jeremy J W; Gerstein, Mark; Liu, Chun-Chi; Cheng, Chao

    2016-08-12

    Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) or microarray hybridization (ChIP-chip) has been widely used to determine the genomic occupation of transcription factors (TFs). We have previously developed a probabilistic method, called TIP (Target Identification from Profiles), to identify TF target genes using ChIP-seq/ChIP-chip data. To achieve high specificity, TIP applies a conservative method to estimate significance of target genes, with the trade-off being a relatively low sensitivity of target gene identification compared to other methods. Additionally, TIP's output does not render binding-peak locations or intensity, information highly useful for visualization and general experimental biological use, while the variability of ChIP-seq/ChIP-chip file formats has made input into TIP more difficult than desired. To improve upon these facets, here we present are fined TIP with key extensions. First, it implements a Gaussian mixture model for p-value estimation, increasing target gene identification sensitivity and more accurately capturing the shape of TF binding profile distributions. Second, it enables the incorporation of TF binding-peak data by identifying their locations in significant target gene promoter regions and quantifies their strengths. Finally, for full ease of implementation we have incorporated it into a web server ( http://syslab3.nchu.edu.tw/iTAR/ ) that enables flexibility of input file format, can be used across multiple species and genome assembly versions, and is freely available for public use. The web server additionally performs GO enrichment analysis for the identified target genes to reveal the potential function of the corresponding TF. The iTAR web server provides a user-friendly interface and supports target gene identification in seven species, ranging from yeast to human. To facilitate investigating the quality of ChIP-seq/ChIP-chip data, the web server generates the chart of the characteristic binding profiles and the density plot of normalized regulatory scores. The iTAR web server is a useful tool in identifying TF target genes from ChIP-seq/ChIP-chip data and discovering biological insights.

  4. Missing data and technical variability in single-cell RNA-sequencing experiments.

    PubMed

    Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A

    2017-11-06

    Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Mapping RNA-seq Reads with STAR

    PubMed Central

    Dobin, Alexander; Gingeras, Thomas R.

    2015-01-01

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920

  6. Mapping RNA-seq Reads with STAR.

    PubMed

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  7. SPAR: small RNA-seq portal for analysis of sequencing experiments.

    PubMed

    Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee

    2018-05-04

    The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

  8. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

    PubMed

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.

  9. Performance comparison of SNP detection tools with illumina exome sequencing data—an assessment using both family pedigree information and sample-matched SNP array data

    PubMed Central

    Yi, Ming; Zhao, Yongmei; Jia, Li; He, Mei; Kebebew, Electron; Stephens, Robert M.

    2014-01-01

    To apply exome-seq-derived variants in the clinical setting, there is an urgent need to identify the best variant caller(s) from a large collection of available options. We have used an Illumina exome-seq dataset as a benchmark, with two validation scenarios—family pedigree information and SNP array data for the same samples, permitting global high-throughput cross-validation, to evaluate the quality of SNP calls derived from several popular variant discovery tools from both the open-source and commercial communities using a set of designated quality metrics. To the best of our knowledge, this is the first large-scale performance comparison of exome-seq variant discovery tools using high-throughput validation with both Mendelian inheritance checking and SNP array data, which allows us to gain insights into the accuracy of SNP calling through such high-throughput validation in an unprecedented way, whereas the previously reported comparison studies have only assessed concordance of these tools without directly assessing the quality of the derived SNPs. More importantly, the main purpose of our study was to establish a reusable procedure that applies high-throughput validation to compare the quality of SNP discovery tools with a focus on exome-seq, which can be used to compare any forthcoming tool(s) of interest. PMID:24831545

  10. Whole-transcriptome, high-throughput RNA sequence analysis of the bovine macrophage response to Mycobacterium bovis infection in vitro.

    PubMed

    Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E

    2013-04-08

    Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.

  11. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells.

    PubMed

    Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun

    2015-01-01

    Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.

  12. Quick genome sequencing of “Candidatus Liberibacter” strains by use of Enrichment-Enlargement-Next generation sequencing (EEN)

    USDA-ARS?s Scientific Manuscript database

    Members of “Candidatus Liberibacter” are associated with several important plant diseases such as citrus Huanglongbing (HLB) and potato zebra chip (ZC) disease. Inability to culture and low titers in infected hosts have been major obstacles for research on these bacteria. The use of whole genome seq...

  13. Nitrogen Cycle Evaluation (NiCE) Chip for the Simultaneous Analysis of Multiple N-Cycle Associated Genes.

    PubMed

    Oshiki, Mamoru; Segawa, Takahiro; Ishii, Satoshi

    2018-02-02

    Various microorganisms play key roles in the Nitrogen (N) cycle. Quantitative PCR (qPCR) and PCR-amplicon sequencing of the N cycle functional genes allow us to analyze the abundance and diversity of microbes responsible in the N transforming reactions in various environmental samples. However, analysis of multiple target genes can be cumbersome and expensive. PCR-independent analysis, such as metagenomics and metatranscriptomics, is useful but expensive especially when we analyze multiple samples and try to detect N cycle functional genes present at relatively low abundance. Here, we present the application of microfluidic qPCR chip technology to simultaneously quantify and prepare amplicon sequence libraries for multiple N cycle functional genes as well as taxon-specific 16S rRNA gene markers for many samples. This approach, named as N cycle evaluation (NiCE) chip, was evaluated by using DNA from pure and artificially mixed bacterial cultures and by comparing the results with those obtained by conventional qPCR and amplicon sequencing methods. Quantitative results obtained by the NiCE chip were comparable to those obtained by conventional qPCR. In addition, the NiCE chip was successfully applied to examine abundance and diversity of N cycle functional genes in wastewater samples. Although non-specific amplification was detected on the NiCE chip, this could be overcome by optimizing the primer sequences in the future. As the NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes, this tool should advance our ability to explore N cycling in various samples. Importance. We report a novel approach, namely Nitrogen Cycle Evaluation (NiCE) chip by using microfluidic qPCR chip technology. By sequencing the amplicons recovered from the NiCE chip, we can assess diversities of the N cycle functional genes. The NiCE chip technology is applicable to analyze the temporal dynamics of the N cycle gene transcriptions in wastewater treatment bioreactors. The NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes. While there is a room for future improvement, this tool should significantly advance our ability to explore the N cycle in various environmental samples. Copyright © 2018 American Society for Microbiology.

  14. High-Throughput Amplicon-Based Copy Number Detection of 11 Genes in Formalin-Fixed Paraffin-Embedded Ovarian Tumour Samples by MLPA-Seq

    PubMed Central

    Kondrashova, Olga; Love, Clare J.; Lunke, Sebastian; Hsu, Arthur L.; Waring, Paul M.; Taylor, Graham R.

    2015-01-01

    Whilst next generation sequencing can report point mutations in fixed tissue tumour samples reliably, the accurate determination of copy number is more challenging. The conventional Multiplex Ligation-dependent Probe Amplification (MLPA) assay is an effective tool for measurement of gene dosage, but is restricted to around 50 targets due to size resolution of the MLPA probes. By switching from a size-resolved format, to a sequence-resolved format we developed a scalable, high-throughput, quantitative assay. MLPA-seq is capable of detecting deletions, duplications, and amplifications in as little as 5ng of genomic DNA, including from formalin-fixed paraffin-embedded (FFPE) tumour samples. We show that this method can detect BRCA1, BRCA2, ERBB2 and CCNE1 copy number changes in DNA extracted from snap-frozen and FFPE tumour tissue, with 100% sensitivity and >99.5% specificity. PMID:26569395

  15. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  16. Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.

    PubMed

    Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun

    2018-02-27

    Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.

  17. Antibody performance in ChIP-sequencing assays: From quality scores of public data sets to quantitative certification.

    PubMed

    Mendoza-Parra, Marco-Antonio; Saravaki, Vincent; Cholley, Pierre-Etienne; Blum, Matthias; Billoré, Benjamin; Gronemeyer, Hinrich

    2016-01-01

    We have established a certification system for antibodies to be used in chromatin immunoprecipitation assays coupled to massive parallel sequencing (ChIP-seq). This certification comprises a standardized ChIP procedure and the attribution of a numerical quality control indicator (QCi) to biological replicate experiments. The QCi computation is based on a universally applicable quality assessment that quantitates the global deviation of randomly sampled subsets of ChIP-seq dataset with the original genome-aligned sequence reads. Comparison with a QCi database for >28,000 ChIP-seq assays were used to attribute quality grades (ranging from 'AAA' to 'DDD') to a given dataset. In the present report we used the numerical QC system to assess the factors influencing the quality of ChIP-seq assays, including the nature of the target, the sequencing depth and the commercial source of the antibody.  We have used this approach specifically to certify mono and polyclonal antibodies obtained from Active Motif directed against the histone modification marks H3K4me3, H3K27ac and H3K9ac for ChIP-seq. The antibodies received the grades AAA to BBC ( www.ngs-qc.org). We propose to attribute such quantitative grading of all antibodies attributed with the label "ChIP-seq grade".

  18. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.

    PubMed

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-03-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.

  19. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

    PubMed Central

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-01-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945

  20. An interactive environment for agile analysis and visualization of ChIP-sequencing data.

    PubMed

    Lerdrup, Mads; Johansen, Jens Vilstrup; Agrawal-Singh, Shuchi; Hansen, Klaus

    2016-04-01

    To empower experimentalists with a means for fast and comprehensive chromatin immunoprecipitation sequencing (ChIP-seq) data analyses, we introduce an integrated computational environment, EaSeq. The software combines the exploratory power of genome browsers with an extensive set of interactive and user-friendly tools for genome-wide abstraction and visualization. It enables experimentalists to easily extract information and generate hypotheses from their own data and public genome-wide datasets. For demonstration purposes, we performed meta-analyses of public Polycomb ChIP-seq data and established a new screening approach to analyze more than 900 datasets from mouse embryonic stem cells for factors potentially associated with Polycomb recruitment. EaSeq, which is freely available and works on a standard personal computer, can substantially increase the throughput of many analysis workflows, facilitate transparency and reproducibility by automatically documenting and organizing analyses, and enable a broader group of scientists to gain insights from ChIP-seq data.

  1. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    PubMed Central

    Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

    2018-01-01

    The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646

  2. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    PubMed

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.

    PubMed

    Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R

    2017-07-01

    The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.

  4. Genome wide approaches to identify protein-DNA interactions.

    PubMed

    Ma, Tao; Ye, Zhenqing; Wang, Liguo

    2018-05-29

    Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome-wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  5. HSA: a heuristic splice alignment tool.

    PubMed

    Bu, Jingde; Chi, Xuebin; Jin, Zhong

    2013-01-01

    RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.

  6. Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.

    PubMed

    Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai

    2015-10-16

    To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.

  7. BlackOPs: increasing confidence in variant detection through mappability filtering.

    PubMed

    Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil

    2013-10-01

    Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.

  8. An improved ChIP-seq peak detection system for simultaneously identifying post-translational modified transcription factors by combinatorial fusion, using SUMOylation as an example.

    PubMed

    Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching

    2014-01-01

    Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.

  9. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  10. Classifying next-generation sequencing data using a zero-inflated Poisson model.

    PubMed

    Zhou, Yan; Wan, Xiang; Zhang, Baoxue; Tong, Tiejun

    2018-04-15

    With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. The software is available at http://www.math.hkbu.edu.hk/∼tongt. xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary data are available at Bioinformatics online.

  11. Integrating RNA sequencing into neuro-oncology practice.

    PubMed

    Rogawski, David S; Vitanza, Nicholas A; Gauthier, Angela C; Ramaswamy, Vijay; Koschmann, Carl

    2017-11-01

    Malignant tumors of the central nervous system (CNS) cause substantial morbidity and mortality, yet efforts to optimize chemo- and radiotherapy have largely failed to improve dismal prognoses. Over the past decade, RNA sequencing (RNA-seq) has emerged as a powerful tool to comprehensively characterize the transcriptome of CNS tumor cells in one high-throughput step, leading to improved understanding of CNS tumor biology and suggesting new routes for targeted therapies. RNA-seq has been instrumental in improving the diagnostic classification of brain tumors, characterizing oncogenic fusion genes, and shedding light on intratumor heterogeneity. Currently, RNA-seq is beginning to be incorporated into regular neuro-oncology practice in the form of precision neuro-oncology programs, which use information from tumor sequencing to guide implementation of personalized targeted therapies. These programs show great promise in improving patient outcomes for tumors where single agent trials have been ineffective. As RNA-seq is a relatively new technique, many further applications yielding new advances in CNS tumor research and management are expected in the coming years. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles

    PubMed Central

    Portales-Casamar, Elodie; Thongjuea, Supat; Kwon, Andrew T.; Arenillas, David; Zhao, Xiaobei; Valen, Eivind; Yusuf, Dimas; Lenhard, Boris; Wasserman, Wyeth W.; Sandelin, Albin

    2010-01-01

    JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches. PMID:19906716

  13. Meta-analysis of RNA-Seq data across cohorts in a multi-season feed efficiency study of crossbred beef steers accounts for biological and technical variability within season

    USDA-ARS?s Scientific Manuscript database

    High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...

  14. Next Generation MUT-MAP, a High-Sensitivity High-Throughput Microfluidics Chip-Based Mutation Analysis Panel

    PubMed Central

    Patel, Rajesh; Tsan, Alison; Sumiyoshi, Teiko; Fu, Ling; Desai, Rupal; Schoenbrunner, Nancy; Myers, Thomas W.; Bauer, Keith; Smith, Edward; Raja, Rajiv

    2014-01-01

    Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS) platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP), a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA) using allele-specific PCR (AS-PCR) and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE) tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC) database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in personalized medicine and cancer drug development. PMID:24658394

  15. Microfluidic single-cell whole-transcriptome sequencing.

    PubMed

    Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi

    2014-05-13

    Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.

  16. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  17. An Alternative Approach to ChIP-Seq Normalization Enables Detection of Genome-Wide Changes in Histone H3 Lysine 27 Trimethylation upon EZH2 Inhibition

    PubMed Central

    Yuan, Chih-Chi; Craske, Madeleine Lisa; Labhart, Paul; Guler, Gulfem D.; Arnott, David; Maile, Tobias M.; Busby, Jennifer; Henry, Chisato; Kelly, Theresa K.; Tindell, Charles A.; Jhunjhunwala, Suchit; Zhao, Feng; Hatton, Charlie; Bryant, Barbara M.

    2016-01-01

    Chromatin immunoprecipitation and DNA sequencing (ChIP-seq) has been instrumental in inferring the roles of histone post-translational modifications in the regulation of transcription, chromatin compaction and other cellular processes that require modulation of chromatin structure. However, analysis of ChIP-seq data is challenging when the manipulation of a chromatin-modifying enzyme significantly affects global levels of histone post-translational modifications. For example, small molecule inhibition of the methyltransferase EZH2 reduces global levels of histone H3 lysine 27 trimethylation (H3K27me3). However, standard ChIP-seq normalization and analysis methods fail to detect a decrease upon EZH2 inhibitor treatment. We overcome this challenge by employing an alternative normalization approach that is based on the addition of Drosophila melanogaster chromatin and a D. melanogaster-specific antibody into standard ChIP reactions. Specifically, the use of an antibody that exclusively recognizes the D. melanogaster histone variant H2Av enables precipitation of D. melanogaster chromatin as a minor fraction of the total ChIP DNA. The D. melanogaster ChIP-seq tags are used to normalize the human ChIP-seq data from DMSO and EZH2 inhibitor-treated samples. Employing this strategy, a substantial reduction in H3K27me3 signal is now observed in ChIP-seq data from EZH2 inhibitor treated samples. PMID:27875550

  18. An Alternative Approach to ChIP-Seq Normalization Enables Detection of Genome-Wide Changes in Histone H3 Lysine 27 Trimethylation upon EZH2 Inhibition.

    PubMed

    Egan, Brian; Yuan, Chih-Chi; Craske, Madeleine Lisa; Labhart, Paul; Guler, Gulfem D; Arnott, David; Maile, Tobias M; Busby, Jennifer; Henry, Chisato; Kelly, Theresa K; Tindell, Charles A; Jhunjhunwala, Suchit; Zhao, Feng; Hatton, Charlie; Bryant, Barbara M; Classon, Marie; Trojer, Patrick

    2016-01-01

    Chromatin immunoprecipitation and DNA sequencing (ChIP-seq) has been instrumental in inferring the roles of histone post-translational modifications in the regulation of transcription, chromatin compaction and other cellular processes that require modulation of chromatin structure. However, analysis of ChIP-seq data is challenging when the manipulation of a chromatin-modifying enzyme significantly affects global levels of histone post-translational modifications. For example, small molecule inhibition of the methyltransferase EZH2 reduces global levels of histone H3 lysine 27 trimethylation (H3K27me3). However, standard ChIP-seq normalization and analysis methods fail to detect a decrease upon EZH2 inhibitor treatment. We overcome this challenge by employing an alternative normalization approach that is based on the addition of Drosophila melanogaster chromatin and a D. melanogaster-specific antibody into standard ChIP reactions. Specifically, the use of an antibody that exclusively recognizes the D. melanogaster histone variant H2Av enables precipitation of D. melanogaster chromatin as a minor fraction of the total ChIP DNA. The D. melanogaster ChIP-seq tags are used to normalize the human ChIP-seq data from DMSO and EZH2 inhibitor-treated samples. Employing this strategy, a substantial reduction in H3K27me3 signal is now observed in ChIP-seq data from EZH2 inhibitor treated samples.

  19. ScaffoldSeq: Software for characterization of directed evolution populations.

    PubMed

    Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J

    2016-07-01

    ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  20. Retrotransposon Capture Sequencing (RC-Seq): A Targeted, High-Throughput Approach to Resolve Somatic L1 Retrotransposition in Humans.

    PubMed

    Sanchez-Luque, Francisco J; Richardson, Sandra R; Faulkner, Geoffrey J

    2016-01-01

    Mobile genetic elements (MGEs) are of critical importance in genomics and developmental biology. Polymorphic and somatic MGE insertions have the potential to impact the phenotype of an individual, depending on their genomic locations and functional consequences. However, the identification of polymorphic and somatic insertions among the plethora of copies residing in the genome presents a formidable technical challenge. Whole genome sequencing has the potential to address this problem; however, its efficacy depends on the abundance of cells carrying the new insertion. Robust detection of somatic insertions present in only a subset of cells within a given sample can also be prohibitively expensive due to a requirement for high sequencing depth. Here, we describe retrotransposon capture sequencing (RC-seq), a sequence capture approach in which Illumina libraries are enriched for fragments containing the 5' and 3' termini of specific MGEs. RC-seq allows the detection of known polymorphic insertions present in an individual, as well as the identification of rare or private germline insertions not previously described. Furthermore, RC-seq can be used to detect and characterize somatic insertions, providing a valuable tool to elucidate the extent and characteristics of MGE activity in healthy tissues and in various disease states.

  1. Accurate Sample Assignment in a Multiplexed, Ultrasensitive, High-Throughput Sequencing Assay for Minimal Residual Disease.

    PubMed

    Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike

    2016-07-01

    High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  2. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. Combining multiple ChIP-seq peak detection systems using combinatorial fusion.

    PubMed

    Schweikert, Christina; Brown, Stuart; Tang, Zuojian; Smith, Phillip R; Hsu, D Frank

    2012-01-01

    Due to the recent rapid development in ChIP-seq technologies, which uses high-throughput next-generation DNA sequencing to identify the targets of Chromatin Immunoprecipitation, there is an increasing amount of sequencing data being generated that provides us with greater opportunity to analyze genome-wide protein-DNA interactions. In particular, we are interested in evaluating and enhancing computational and statistical techniques for locating protein binding sites. Many peak detection systems have been developed; in this study, we utilize the following six: CisGenome, MACS, PeakSeq, QuEST, SISSRs, and TRLocator. We define two methods to merge and rescore the regions of two peak detection systems and analyze the performance based on average precision and coverage of transcription start sites. The results indicate that ChIP-seq peak detection can be improved by fusion using score or rank combination. Our method of combination and fusion analysis would provide a means for generic assessment of available technologies and systems and assist researchers in choosing an appropriate system (or fusion method) for analyzing ChIP-seq data. This analysis offers an alternate approach for increasing true positive rates, while decreasing false positive rates and hence improving the ChIP-seq peak identification process.

  4. SeqAPASS: Sequence alignment to predict across-species susceptibility

    EPA Science Inventory

    Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, s...

  5. Recovering complete mitochondrial genome sequences from RNA-Seq: A case study of Polytomella non-photosynthetic green algae.

    PubMed

    Tian, Yao; Smith, David Roy

    2016-05-01

    Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  6. MiSNPDb: a web-based genomic resources of tropical ecology fruit mango (Mangifera indica L.) for phylogeography and varietal differentiation.

    PubMed

    Iquebal, M A; Jaiswal, Sarika; Mahato, Ajay Kumar; Jayaswal, Pawan K; Angadi, U B; Kumar, Neeraj; Sharma, Nimisha; Singh, Anand K; Srivastav, Manish; Prakash, Jai; Singh, S K; Khan, Kasim; Mishra, Rupesh K; Rajan, Shailendra; Bajpai, Anju; Sandhya, B S; Nischita, Puttaraju; Ravishankar, K V; Dinesh, M R; Rai, Anil; Kumar, Dinesh; Sharma, Tilak R; Singh, Nagendra K

    2017-11-02

    Mango is one of the most important fruits of tropical ecological region of the world, well known for its nutritive value, aroma and taste. Its world production is >45MT worth >200 billion US dollars. Genomic resources are required for improvement in productivity and management of mango germplasm. There is no web-based genomic resources available for mango. Hence rapid and cost-effective high throughput putative marker discovery is required to develop such resources. RAD-based marker discovery can cater this urgent need till whole genome sequence of mango becomes available. Using a panel of 84 mango varieties, a total of 28.6 Gb data was generated by ddRAD-Seq approach on Illumina HiSeq 2000 platform. A total of 1.25 million SNPs were discovered. Phylogenetic tree using 749 common SNPs across these varieties revealed three major lineages which was compared with geographical locations. A web genomic resources MiSNPDb, available at http://webtom.cabgrid.res.in/mangosnps/ is based on 3-tier architecture, developed using PHP, MySQL and Javascript. This web genomic resources can be of immense use in the development of high density linkage map, QTL discovery, varietal differentiation, traceability, genome finishing and SNP chip development for future GWAS in genomic selection program. We report here world's first web-based genomic resources for genetic improvement and germplasm management of mango.

  7. ChIP-Seq Analysis for Identifying Genome-Wide Histone Modifications Associated with Stress-Responsive Genes in Plants.

    PubMed

    Li, Guosheng; Jagadeeswaran, Guru; Mort, Andrew; Sunkar, Ramanjulu

    2017-01-01

    Histone modifications represent the crux of epigenetic gene regulation essential for most biological processes including abiotic stress responses in plants. Thus, identification of histone modifications at the genome-scale can provide clues for how some genes are 'turned-on' while some others are "turned-off" in response to stress. This chapter details a step-by-step protocol for identifying genome-wide histone modifications associated with stress-responsive gene regulation using chromatin immunoprecipitation (ChIP) followed by sequencing of the DNA (ChIP-seq).

  8. Elucidating the 16S rRNA 3' boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data.

    PubMed

    Wei, Yulong; Silke, Jordan R; Xia, Xuhua

    2017-12-15

    Bacterial translation initiation is influenced by base pairing between the Shine-Dalgarno (SD) sequence in the 5' UTR of mRNA and the anti-SD (aSD) sequence at the free 3' end of the 16S rRNA (3' TAIL) due to: 1) the SD/aSD sequence binding location and 2) SD/aSD binding affinity. In order to understand what makes an SD/aSD interaction optimal, we must define: 1) terminus of the 3' TAIL and 2) extent of the core aSD sequence within the 3' TAIL. Our approach to characterize these components in Escherichia coli and Bacillus subtilis involves 1) mapping the 3' boundary of the mature 16S rRNA using high-throughput RNA sequencing (RNA-Seq), and 2) identifying the segment within the 3' TAIL that is strongly preferred in SD/aSD pairing. Using RNA-Seq data, we resolve previous discrepancies in the reported 3' TAIL in B. subtilis and recovered the established 3' TAIL in E. coli. Furthermore, we extend previous studies to suggest that both highly and lowly expressed genes favor SD sequences with intermediate binding affinity, but this trend is exclusive to SD sequences that complement the core aSD sequences defined herein.

  9. Information transduction capacity reduces the uncertainties in annotation-free isoform discovery and quantification

    PubMed Central

    Deng, Yue; Bao, Feng; Yang, Yang; Ji, Xiangyang; Du, Mulong; Zhang, Zhengdong

    2017-01-01

    Abstract The automated transcript discovery and quantification of high-throughput RNA sequencing (RNA-seq) data are important tasks of next-generation sequencing (NGS) research. However, these tasks are challenging due to the uncertainties that arise in the inference of complete splicing isoform variants from partially observed short reads. Here, we address this problem by explicitly reducing the inherent uncertainties in a biological system caused by missing information. In our approach, the RNA-seq procedure for transforming transcripts into short reads is considered an information transmission process. Consequently, the data uncertainties are substantially reduced by exploiting the information transduction capacity of information theory. The experimental results obtained from the analyses of simulated datasets and RNA-seq datasets from cell lines and tissues demonstrate the advantages of our method over state-of-the-art competitors. Our algorithm is an open-source implementation of MaxInfo. PMID:28911101

  10. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  11. A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics

    PubMed Central

    House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.

    2017-01-01

    Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636

  12. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  13. Normalization, bias correction, and peak calling for ChIP-seq

    PubMed Central

    Diaz, Aaron; Park, Kiyoub; Lim, Daniel A.; Song, Jun S.

    2012-01-01

    Next-generation sequencing is rapidly transforming our ability to profile the transcriptional, genetic, and epigenetic states of a cell. In particular, sequencing DNA from the immunoprecipitation of protein-DNA complexes (ChIP-seq) and methylated DNA (MeDIP-seq) can reveal the locations of protein binding sites and epigenetic modifications. These approaches contain numerous biases which may significantly influence the interpretation of the resulting data. Rigorous computational methods for detecting and removing such biases are still lacking. Also, multi-sample normalization still remains an important open problem. This theoretical paper systematically characterizes the biases and properties of ChIP-seq data by comparing 62 separate publicly available datasets, using rigorous statistical models and signal processing techniques. Statistical methods for separating ChIP-seq signal from background noise, as well as correcting enrichment test statistics for sequence-dependent and sonication biases, are presented. Our method effectively separates reads into signal and background components prior to normalization, improving the signal-to-noise ratio. Moreover, most peak callers currently use a generic null model which suffers from low specificity at the sensitivity level requisite for detecting subtle, but true, ChIP enrichment. The proposed method of determining a cell type-specific null model, which accounts for cell type-specific biases, is shown to be capable of achieving a lower false discovery rate at a given significance threshold than current methods. PMID:22499706

  14. Issues with RNA-seq analysis in non-model organisms: A salmonid example.

    PubMed

    Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni

    2017-10-01

    High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. zUMIs - A fast and flexible pipeline to process RNA sequencing data with UMIs.

    PubMed

    Parekh, Swati; Ziegenhain, Christoph; Vieth, Beate; Enard, Wolfgang; Hellmann, Ines

    2018-06-01

    Single-cell RNA-sequencing (scRNA-seq) experiments typically analyze hundreds or thousands of cells after amplification of the cDNA. The high throughput is made possible by the early introduction of sample-specific bar codes (BCs), and the amplification bias is alleviated by unique molecular identifiers (UMIs). Thus, the ideal analysis pipeline for scRNA-seq data needs to efficiently tabulate reads according to both BC and UMI. zUMIs is a pipeline that can handle both known and random BCs and also efficiently collapse UMIs, either just for exon mapping reads or for both exon and intron mapping reads. If BC annotation is missing, zUMIs can accurately detect intact cells from the distribution of sequencing reads. Another unique feature of zUMIs is the adaptive downsampling function that facilitates dealing with hugely varying library sizes but also allows the user to evaluate whether the library has been sequenced to saturation. To illustrate the utility of zUMIs, we analyzed a single-nucleus RNA-seq dataset and show that more than 35% of all reads map to introns. Also, we show that these intronic reads are informative about expression levels, significantly increasing the number of detected genes and improving the cluster resolution. zUMIs flexibility makes if possible to accommodate data generated with any of the major scRNA-seq protocols that use BCs and UMIs and is the most feature-rich, fast, and user-friendly pipeline to process such scRNA-seq data.

  16. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  17. Deciphering the glycosaminoglycan code with the help of microarrays.

    PubMed

    de Paz, Jose L; Seeberger, Peter H

    2008-07-01

    Carbohydrate microarrays have become a powerful tool to elucidate the biological role of complex sugars. Microarrays are particularly useful for the study of glycosaminoglycans (GAGs), a key class of carbohydrates. The high-throughput chip format enables rapid screening of large numbers of potential GAG sequences produced via a complex biosynthesis while consuming very little sample. Here, we briefly highlight the most recent advances involving GAG microarrays built with synthetic or naturally derived oligosaccharides. These chips are powerful tools for characterizing GAG-protein interactions and determining structure-activity relationships for specific sequences. Thereby, they contribute to decoding the information contained in specific GAG sequences.

  18. FUNDAMENTALS OF VITAMIN D HORMONE-REGULATED GENE EXPRESSION

    PubMed Central

    Pike, J. Wesley; Meyer, Mark B.

    2014-01-01

    Initial research focused upon several known genetic targets provided early insight into the mechanism of action of the vitamin D hormone (1,25-dihydroxyvitamin D3 (1,25(OH)2D3)). Recently, however, a series of technical advances involving the coupling of chromatin immunoprecipitation (ChIP) to unbiased methodologies that initially involved tiled DNA microarrays (ChIP-chip analysis) and now Next Generation DNA Sequencing techniques (ChIP-Seq analysis) has opened new avenues of research into the mechanisms through which 1,25(OH)2D3 regulates gene expression. In this review, we summarize briefly the results of this early work and then focus on more recent studies in which ChIP-chip and ChIP-seq analyses have been used to explore the mechanisms of 1,25(OH)2D3 action on a genome-wide scale providing specific target genes as examples. The results of this work have advanced our understanding of the mechanisms involved at both genetic and epigenetic levels and have revealed a series of new principles through which the vitamin D hormone functions to control the expression of genes. PMID:24239506

  19. A normalization strategy for comparing tag count data

    PubMed Central

    2012-01-01

    Background High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. Results We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. Conclusion Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. PMID:22475125

  20. Cross-species extrapolation of mammalian-based ToxCast Data using Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

    EPA Science Inventory

    In vitro high-throughput screening (HTS) and in silico technologies have emerged as 21st century tools for chemical hazard identification. In 2007 the U.S. Environmental Protection Agency (EPA) launched the ToxCast Program, which has screened thousands of chemicals in hundreds of...

  1. A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

    PubMed Central

    Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.

    2011-01-01

    Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452

  2. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.

    PubMed

    Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H

    2015-02-24

    Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.

  3. SALP, a new single-stranded DNA library preparation method especially useful for the high-throughput characterization of chromatin openness states.

    PubMed

    Wu, Jian; Dai, Wei; Wu, Lin; Wang, Jinke

    2018-02-13

    Next-generation sequencing (NGS) is fundamental to the current biological and biomedical research. Construction of sequencing library is a key step of NGS. Therefore, various library construction methods have been explored. However, the current methods are still limited by some shortcomings. This study developed a new NGS library construction method, Single strand Adaptor Library Preparation (SALP), by using a novel single strand adaptor (SSA). SSA is a double-stranded oligonucleotide with a 3' overhang of 3 random nucleotides, which can be efficiently ligated to the 3' end of single strand DNA by T4 DNA ligase. SALP can be started with any denatured DNA fragments such as those sheared by Tn5 tagmentation, enzyme digestion and sonication. When started with Tn5-tagmented chromatin, SALP can overcome a key limitation of ATAC-seq and become a high-throughput NGS library construction method, SALP-seq, which can be used to comparatively characterize the chromatin openness state of multiple cells unbiasly. In this way, this study successfully characterized the comparative chromatin openness states of four different cell lines, including GM12878, HepG2, HeLa and 293T, with SALP-seq. Similarly, this study also successfully characterized the chromatin openness states of HepG2 cells with SALP-seq by using 10 5 to 500 cells. This study developed a new NGS library construction method, SALP, by using a novel kind of single strand adaptor (SSA), which should has wide applications in the future due to its unique performance.

  4. The Impact of Normalization Methods on RNA-Seq Data Analysis

    PubMed Central

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  5. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

    PubMed

    Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

    2013-07-01

    Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  6. SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments.

    PubMed

    Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John

    2010-07-14

    High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.

  7. SEQADAPT: an adaptable system for the tracking, storage and analysis of high throughput sequencing experiments

    PubMed Central

    2010-01-01

    Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057

  8. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

    PubMed

    de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M

    2017-07-06

    Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

  9. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

    PubMed Central

    2017-01-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584

  10. Capturing the Alternative Cleavage and Polyadenylation Sites of 14 NAC Genes in Populus Using a Combination of 3'-RACE and High-Throughput Sequencing.

    PubMed

    Wang, Haoran; Wang, Mingxiu; Cheng, Qiang

    2018-03-08

    Detection of complex splice sites (SSs) and polyadenylation sites (PASs) of eukaryotic genes is essential for the elucidation of gene regulatory mechanisms. Transcriptome-wide studies using high-throughput sequencing (HTS) have revealed prevalent alternative splicing (AS) and alternative polyadenylation (APA) in plants. However, small-scale and high-depth HTS aimed at detecting genes or gene families are very few and limited. We explored a convenient and flexible method for profiling SSs and PASs, which combines rapid amplification of 3'-cDNA ends (3'-RACE) and HTS. Fourteen NAC (NAM, ATAF1/2, CUC2) transcription factor genes of Populus trichocarpa were analyzed by 3'-RACE-seq. Based on experimental reproducibility, boundary sequence analysis and reverse transcription PCR (RT-PCR) verification, only canonical SSs were considered to be authentic. Based on stringent criteria, candidate PASs without any internal priming features were chosen as authentic PASs and assumed to be PAS-rich markers. Thirty-four novel canonical SSs, six intronic/internal exons and thirty 3'-UTR PAS-rich markers were revealed by 3'-RACE-seq. Using 3'-RACE and real-time PCR, we confirmed that three APA transcripts ending in/around PAS-rich markers were differentially regulated in response to plant hormones. Our results indicate that 3'-RACE-seq is a robust and cost-effective method to discover SSs and label active regions subjected to APA for genes or gene families. The method is suitable for small-scale AS and APA research in the initial stage.

  11. metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA.

    PubMed

    Dale, Ryan K; Matzat, Leah H; Lei, Elissa P

    2014-08-01

    Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data. Based on the metaseq-enabled analysis presented here, we propose a model where Shep associates with chromatin cotranscriptionally, then is recruited to insulator complexes in trans where it plays a negative role in insulator activity. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  12. Optimal use of tandem biotin and V5 tags in ChIP assays

    PubMed Central

    Kolodziej, Katarzyna E; Pourfarzad, Farzin; de Boer, Ernie; Krpic, Sanja; Grosveld, Frank; Strouboulis, John

    2009-01-01

    Background Chromatin immunoprecipitation (ChIP) assays coupled to genome arrays (Chip-on-chip) or massive parallel sequencing (ChIP-seq) lead to the genome wide identification of binding sites of chromatin associated proteins. However, the highly variable quality of antibodies and the availability of epitopes in crosslinked chromatin can compromise genomic ChIP outcomes. Epitope tags have often been used as more reliable alternatives. In addition, we have employed protein in vivo biotinylation tagging as a very high affinity alternative to antibodies. In this paper we describe the optimization of biotinylation tagging for ChIP and its coupling to a known epitope tag in providing a reliable and efficient alternative to antibodies. Results Using the biotin tagged erythroid transcription factor GATA-1 as example, we describe several optimization steps for the application of the high affinity biotin streptavidin system in ChIP. We find that the omission of SDS during sonication, the use of fish skin gelatin as blocking agent and choice of streptavidin beads can lead to significantly improved ChIP enrichments and lower background compared to antibodies. We also show that the V5 epitope tag performs equally well under the conditions worked out for streptavidin ChIP and that it may suffer less from the effects of formaldehyde crosslinking. Conclusion The combined use of the very high affinity biotin tag with the less sensitive to crosslinking V5 tag provides for a flexible ChIP platform with potential implications in ChIP sequencing outcomes. PMID:19196479

  13. Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome

    PubMed Central

    Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.

    2011-01-01

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880

  14. A scalable double-barcode sequencing platform for characterization of dynamic protein-protein interactions.

    PubMed

    Schlecht, Ulrich; Liu, Zhimin; Blundell, Jamie R; St Onge, Robert P; Levy, Sasha F

    2017-05-25

    Several large-scale efforts have systematically catalogued protein-protein interactions (PPIs) of a cell in a single environment. However, little is known about how the protein interactome changes across environmental perturbations. Current technologies, which assay one PPI at a time, are too low throughput to make it practical to study protein interactome dynamics. Here, we develop a highly parallel protein-protein interaction sequencing (PPiSeq) platform that uses a novel double barcoding system in conjunction with the dihydrofolate reductase protein-fragment complementation assay in Saccharomyces cerevisiae. PPiSeq detects PPIs at a rate that is on par with current assays and, in contrast with current methods, quantitatively scores PPIs with enough accuracy and sensitivity to detect changes across environments. Both PPI scoring and the bulk of strain construction can be performed with cell pools, making the assay scalable and easily reproduced across environments. PPiSeq is therefore a powerful new tool for large-scale investigations of dynamic PPIs.

  15. beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types

    PubMed Central

    Pagès, Hervé

    2018-01-01

    Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188

  16. beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.

    PubMed

    Lun, Aaron T L; Pagès, Hervé; Smith, Mike L

    2018-05-01

    Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.

  17. Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq.

    PubMed

    Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao

    2017-12-07

    Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Tn5Prime, a Tn5 based 5' capture method for single cell RNA-seq.

    PubMed

    Cole, Charles; Byrne, Ashley; Beaudin, Anna E; Forsberg, E Camilla; Vollmers, Christopher

    2018-06-01

    RNA-sequencing (RNA-seq) is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase-based Smart-seq2 protocol to create RNA-seq libraries that capture the 5' end of transcripts. The Tn5Prime method dramatically streamlines the 5' capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples, we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3' end-based high-throughput methods like Drop-seq and 10× Genomics Chromium, the 5' capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3' end-based methods, Tn5Prime also enables the assembly of the variable 5' ends of the antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.

  19. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome

    PubMed Central

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-01-01

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755

  20. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-07-05

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.

  1. Identification and correction of systematic error in high-throughput sequence data

    PubMed Central

    2011-01-01

    Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972

  2. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting.

    PubMed

    Khan, Tarik A; Friedensohn, Simon; Gorter de Vries, Arthur R; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T

    2016-03-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion-the intraclonal diversity index-which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.

  4. Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

    PubMed Central

    Khan, Tarik A.; Friedensohn, Simon; de Vries, Arthur R. Gorter; Straszewski, Jakub; Ruscheweyh, Hans-Joachim; Reddy, Sai T.

    2016-01-01

    High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology. PMID:26998518

  5. CNV-seq, a new method to detect copy number variation using high-throughput sequencing.

    PubMed

    Xie, Chao; Tammi, Martti T

    2009-03-06

    DNA copy number variation (CNV) has been recognized as an important source of genetic variation. Array comparative genomic hybridization (aCGH) is commonly used for CNV detection, but the microarray platform has a number of inherent limitations. Here, we describe a method to detect copy number variation using shotgun sequencing, CNV-seq. The method is based on a robust statistical model that describes the complete analysis procedure and allows the computation of essential confidence values for detection of CNV. Our results show that the number of reads, not the length of the reads is the key factor determining the resolution of detection. This favors the next-generation sequencing methods that rapidly produce large amount of short reads. Simulation of various sequencing methods with coverage between 0.1x to 8x show overall specificity between 91.7 - 99.9%, and sensitivity between 72.2 - 96.5%. We also show the results for assessment of CNV between two individual human genomes.

  6. Chromatin immunoprecipitation (ChIP) method for non-model fruit flies (Diptera: Tephritidae) and evidence of histone modifications.

    PubMed

    Nagalingam, Kumaran; Lorenc, Michał T; Manoli, Sahana; Cameron, Stephen L; Clarke, Anthony R; Dudley, Kevin J

    2018-01-01

    Interactions between DNA and proteins located in the cell nucleus play an important role in controlling physiological processes by specifying, augmenting and regulating context-specific transcription events. Chromatin immunoprecipitation (ChIP) is a widely used methodology to study DNA-protein interactions and has been successfully used in various cell types for over three decades. More recently, by combining ChIP with genomic screening technologies and Next Generation Sequencing (e.g. ChIP-seq), it has become possible to profile DNA-protein interactions (including covalent histone modifications) across entire genomes. However, the applicability of ChIP-chip and ChIP-seq has rarely been extended to non-model species because of a number of technical challenges. Here we report a method that can be used to identify genome wide covalent histone modifications in a group of non-model fruit fly species (Diptera: Tephritidae). The method was developed by testing and refining protocols that have been used in model organisms, including Drosophila melanogaster. We demonstrate that this method is suitable for a group of economically important pest fruit fly species, viz., Bactrocera dorsalis, Ceratitis capitata, Zeugodacus cucurbitae and Bactrocera tryoni. We also report an example ChIP-seq dataset for B. tryoni, providing evidence for histone modifications in the genome of a tephritid fruit fly for the first time. Since tephritids are major agricultural pests globally, this methodology will be a valuable resource to study taxa-specific evolutionary questions and to assist with pest management. It also provides a basis for researchers working with other non-model species to undertake genome wide DNA-protein interaction studies.

  7. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

    PubMed

    Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.

  8. Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing

    PubMed Central

    Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian

    2016-01-01

    Background The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Methods and Results Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications. PMID:27835690

  9. Digital gene expression analysis with sample multiplexing and PCR duplicate detection: A straightforward protocol.

    PubMed

    Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph

    2016-01-01

    Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.

  10. A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.

    The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less

  11. A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys

    DOE PAGES

    Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.

    2015-12-09

    The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less

  12. A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis

    PubMed Central

    Down, Thomas A.; Rakyan, Vardhman K.; Turner, Daniel J.; Flicek, Paul; Li, Heng; Kulesha, Eugene; Gräf, Stefan; Johnson, Nathan; Herrero, Javier; Tomazou, Eleni M.; Thorne, Natalie P.; Bäckdahl, Liselotte; Herberth, Marlis; Howe, Kevin L.; Jackson, David K.; Miretti, Marcos M.; Marioni, John C.; Birney, Ewan; Hubbard, Tim J. P.; Durbin, Richard; Tavaré, Simon; Beck, Stephan

    2009-01-01

    DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation. PMID:18612301

  13. High-throughput detection of RNA processing in bacteria.

    PubMed

    Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L

    2018-03-27

    Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .

  14. High-sensitivity HLA typing by Saturated Tiling Capture Sequencing (STC-Seq).

    PubMed

    Jiao, Yang; Li, Ran; Wu, Chao; Ding, Yibin; Liu, Yanning; Jia, Danmei; Wang, Lifeng; Xu, Xiang; Zhu, Jing; Zheng, Min; Jia, Junling

    2018-01-15

    Highly polymorphic human leukocyte antigen (HLA) genes are responsible for fine-tuning the adaptive immune system. High-resolution HLA typing is important for the treatment of autoimmune and infectious diseases. Additionally, it is routinely performed for identifying matched donors in transplantation medicine. Although many HLA typing approaches have been developed, the complexity, low-efficiency and high-cost of current HLA-typing assays limit their application in population-based high-throughput HLA typing for donors, which is required for creating large-scale databases for transplantation and precision medicine. Here, we present a cost-efficient Saturated Tiling Capture Sequencing (STC-Seq) approach to capturing 14 HLA class I and II genes. The highly efficient capture (an approximately 23,000-fold enrichment) of these genes allows for simplified allele calling. Tests on five genes (HLA-A/B/C/DRB1/DQB1) from 31 human samples and 351 datasets using STC-Seq showed results that were 98% consistent with the known two sets of digitals (field1 and field2) genotypes. Additionally, STC can capture genomic DNA fragments longer than 3 kb from HLA loci, making the library compatible with the third-generation sequencing. STC-Seq is a highly accurate and cost-efficient method for HLA typing which can be used to facilitate the establishment of population-based HLA databases for the precision and transplantation medicine.

  15. Research Associate | Center for Cancer Research

    Cancer.gov

    The Basic Science Program (BSP) at the Frederick National Laboratory for Cancer Research (FNLCR) pursues independent, multidisciplinary research programs in basic or applied molecular biology, immunology, retrovirology, cancer biology or human genetics. As part of the BSP, the Microbiome and Genetics Core (the Core) characterizes microbiomes by next-generation sequencing to determine their composition and variation, as influenced by immune, genetic, and host health factors. The Core provides support across a spectrum of processes, from nucleic acid isolation through bioinformatics and statistical analysis. KEY ROLES/RESPONSIBILITIES The Research Associate II will provide support in the areas of automated isolation, preparation, PCR and sequencing of DNA on next generation platforms (Illumina MiSeq and NextSeq). An opportunity exists to join the Core’s team of highly trained experimentalists and bioinformaticians working to characterize microbiome samples. The following represent requirements of the position: A minimum of five (5) years related of biomedical experience. Experience with high-throughput nucleic acid (DNA/RNA) extraction. Experience in performing PCR amplification (including quantitative real-time PCR). Experience or familiarity with robotic liquid handling protocols (especially on the Eppendorf epMotion 5073 or 5075 platforms). Experience in operating and maintaining benchtop Illumina sequencers (MiSeq and NextSeq). Ability to evaluate experimental quality and to troubleshoot molecular biology protocols. Experience with sample tracking, inventory management and biobanking. Ability to operate and communicate effectively in a team-oriented work environment.

  16. Impact of sequencing depth in ChIP-seq experiments

    PubMed Central

    Jung, Youngsook L.; Luquette, Lovelace J.; Ho, Joshua W.K.; Ferrari, Francesco; Tolstorukov, Michael; Minoda, Aki; Issner, Robbyn; Epstein, Charles B.; Karpen, Gary H.; Kuroda, Mitzi I.; Park, Peter J.

    2014-01-01

    In a chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiment, an important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. We present an extensive evaluation of the impact of sequencing depth on identification of enriched regions for key histone modifications (H3K4me3, H3K36me3, H3K27me3 and H3K9me2/me3) using deep-sequenced datasets in human and fly. We propose to define sufficient sequencing depth as the number of reads at which detected enrichment regions increase <1% for an additional million reads. Although the required depth depends on the nature of the mark and the state of the cell in each experiment, we observe that sufficient depth is often reached at <20 million reads for fly. For human, there are no clear saturation points for the examined datasets, but our analysis suggests 40–50 million reads as a practical minimum for most marks. We also devise a mathematical model to estimate the sufficient depth and total genomic coverage of a mark. Lastly, we find that the five algorithms tested do not agree well for broad enrichment profiles, especially at lower depths. Our findings suggest that sufficient sequencing depth and an appropriate peak-calling algorithm are essential for ensuring robustness of conclusions derived from ChIP-seq data. PMID:24598259

  17. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

    PubMed

    Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

    2008-01-15

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.

  18. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes

    PubMed Central

    Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.

    2008-01-01

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818

  19. Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

    PubMed Central

    Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

    2013-01-01

    The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689

  20. RiboGalaxy: A browser based platform for the alignment, analysis and visualization of ribosome profiling data

    PubMed Central

    Michel, Audrey M.; Mullan, James P. A.; Velayudhan, Vimalkumar; O'Connor, Patrick B. F.; Donohue, Claire A.; Baranov, Pavel V.

    2016-01-01

    ABSTRACT Ribosome profiling (ribo-seq) is a technique that uses high-throughput sequencing to reveal the exact locations and densities of translating ribosomes at the entire transcriptome level. The technique has become very popular since its inception in 2009. Yet experimentalists who generate ribo-seq data often have to rely on bioinformaticians to process and analyze their data. We present RiboGalaxy (http://ribogalaxy.ucc.ie), a freely available Galaxy-based web server for processing and analyzing ribosome profiling data with the visualization functionality provided by GWIPS-viz (http://gwips.ucc.ie). RiboGalaxy offers researchers a suite of tools specifically tailored for processing ribo-seq and corresponding mRNA-seq data. Researchers can take advantage of the published workflows which reduce the multi-step alignment process to a minimum of inputs from the user. Users can then explore their own aligned data as custom tracks in GWIPS-viz and compare their ribosome profiles to existing ribo-seq tracks from published studies. In addition, users can assess the quality of their ribo-seq data, determine the strength of the triplet periodicity signal, generate meta-gene ribosome profiles as well as analyze the relative impact of mRNA sequence features on local read density. RiboGalaxy is accompanied by extensive documentation and tips for helping users. In addition we provide a forum (http://gwips.ucc.ie/Forum) where we encourage users to post their questions and feedback to improve the overall RiboGalaxy service. PMID:26821742

  1. Genome-wide ChIP-seq mapping and analysis of butyrate-induced H3K9 and H3K27 acetylation and epigenomic landscape alteration in bovine cells

    USDA-ARS?s Scientific Manuscript database

    Utilizing next-generation sequencing technology, combined with ChIP (Chromatin Immunoprecipitation) technology, we analyzed histone modification (acetylation) induced by butyrate and the large-scale mapping of the epigenomic landscape of normal histone H3 and acetylated histone H3K9 and H3K27. To d...

  2. Repliscan: a tool for classifying replication timing regions.

    PubMed

    Zynda, Gregory J; Song, Jawon; Concia, Lorenzo; Wear, Emily E; Hanley-Bowdoin, Linda; Thompson, William F; Vaughn, Matthew W

    2017-08-07

    Replication timing experiments that use label incorporation and high throughput sequencing produce peaked data similar to ChIP-Seq experiments. However, the differences in experimental design, coverage density, and possible results make traditional ChIP-Seq analysis methods inappropriate for use with replication timing. To accurately detect and classify regions of replication across the genome, we present Repliscan. Repliscan robustly normalizes, automatically removes outlying and uninformative data points, and classifies Repli-seq signals into discrete combinations of replication signatures. The quality control steps and self-fitting methods make Repliscan generally applicable and more robust than previous methods that classify regions based on thresholds. Repliscan is simple and effective to use on organisms with different genome sizes. Even with analysis window sizes as small as 1 kilobase, reliable profiles can be generated with as little as 2.4x coverage.

  3. HTSeq--a Python framework to work with high-throughput sequencing data.

    PubMed

    Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang

    2015-01-15

    A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.

  4. Quantification of differential gene expression by multiplexed targeted resequencing of cDNA

    PubMed Central

    Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.

    2017-01-01

    Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677

  5. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

    PubMed

    Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory

    2017-12-01

    Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.

  6. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  7. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

    PubMed

    Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

    2015-11-15

    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.

  8. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    PubMed Central

    Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

    2015-01-01

    Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307

  9. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE PAGES

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...

    2015-10-28

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  10. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  11. High-throughput sequencing of the chloroplast and mitochondrion of Chlamydomonas reinhardtii to generate improved de novo assemblies, analyze expression patterns and transcript speciation, and evaluate diversity among laboratory strains and wild isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela

    Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less

  12. Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.

    PubMed

    Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John

    2017-10-03

    Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .

  13. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  14. The micro-Petri dish, a million-well growth chip for the culture and high-throughput screening of microorganisms.

    PubMed

    Ingham, Colin J; Sprenkels, Ad; Bomer, Johan; Molenaar, Douwe; van den Berg, Albert; van Hylckama Vlieg, Johan E T; de Vos, Willem M

    2007-11-13

    A miniaturized, disposable microbial culture chip has been fabricated by microengineering a highly porous ceramic sheet with up to one million growth compartments. This versatile culture format, with discrete compartments as small as 7 x 7 mum, allowed the growth of segregated microbial samples at an unprecedented density. The chip has been used for four complementary applications in microbiology. (i) As a fast viable counting system that showed a dynamic range of over 10,000, a low degree of bias, and a high culturing efficiency. (ii) In high-throughput screening, with the recovery of 1 fluorescent microcolony in 10,000. (iii) In screening for an enzyme-based, nondominant phenotype by the targeted recovery of Escherichia coli transformed with the plasmid pUC18, based on expression of the lacZ reporter gene without antibiotic-resistance selection. The ease of rapid, successive changes in the environment of the organisms on the chip, needed for detection of beta-galactosidase activity, highlights an advantageous feature that was also used to screen a metagenomic library for the same activity. (iv) In high-throughput screening of >200,000 isolates from Rhine water based on metabolism of a fluorogenic organophosphate compound, resulting in the recovery of 22 microcolonies with the desired phenotype. These isolates were predicted, on the basis of rRNA sequence, to include six new species. These four applications suggest that the potential for such simple, readily manufactured chips to impact microbial culture is extensive and may facilitate the full automation and multiplexing of microbial culturing, screening, counting, and selection.

  15. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants

    PubMed Central

    Llauro, Christel; Jobet, Edouard; Robakowska-Hyzorek, Dagmara; Lasserre, Eric; Ghesquière, Alain; Panaud, Olivier

    2017-01-01

    Retrotransposons are mobile genetic elements abundant in plant and animal genomes. While efficiently silenced by the epigenetic machinery, they can be reactivated upon stress or during development. Their level of transcription not reflecting their transposition ability, it is thus difficult to evaluate their contribution to the active mobilome. Here we applied a simple methodology based on the high throughput sequencing of extrachromosomal circular DNA (eccDNA) forms of active retrotransposons to characterize the repertoire of mobile retrotransposons in plants. This method successfully identified known active retrotransposons in both Arabidopsis and rice material where the epigenome is destabilized. When applying mobilome-seq to developmental stages in wild type rice, we identified PopRice as a highly active retrotransposon producing eccDNA forms in the wild type endosperm. The mobilome-seq strategy opens new routes for the characterization of a yet unexplored fraction of plant genomes. PMID:28212378

  16. Sequencing the extrachromosomal circular mobilome reveals retrotransposon activity in plants.

    PubMed

    Lanciano, Sophie; Carpentier, Marie-Christine; Llauro, Christel; Jobet, Edouard; Robakowska-Hyzorek, Dagmara; Lasserre, Eric; Ghesquière, Alain; Panaud, Olivier; Mirouze, Marie

    2017-02-01

    Retrotransposons are mobile genetic elements abundant in plant and animal genomes. While efficiently silenced by the epigenetic machinery, they can be reactivated upon stress or during development. Their level of transcription not reflecting their transposition ability, it is thus difficult to evaluate their contribution to the active mobilome. Here we applied a simple methodology based on the high throughput sequencing of extrachromosomal circular DNA (eccDNA) forms of active retrotransposons to characterize the repertoire of mobile retrotransposons in plants. This method successfully identified known active retrotransposons in both Arabidopsis and rice material where the epigenome is destabilized. When applying mobilome-seq to developmental stages in wild type rice, we identified PopRice as a highly active retrotransposon producing eccDNA forms in the wild type endosperm. The mobilome-seq strategy opens new routes for the characterization of a yet unexplored fraction of plant genomes.

  17. YM500: a small RNA sequencing (smRNA-seq) database for microRNA research

    PubMed Central

    Cheng, Wei-Chung; Chung, I-Fang; Huang, Tse-Shun; Chang, Shih-Ting; Sun, Hsing-Jen; Tsai, Cheng-Fong; Liang, Muh-Lii; Wong, Tai-Tong; Wang, Hsei-Wei

    2013-01-01

    MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications. PMID:23203880

  18. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology

    PubMed Central

    Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.

    2016-01-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274

  19. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake

    USGS Publications Warehouse

    Castoe, Todd A.; Poole, Alexander W.; de Koning, A. P. Jason; Jones, Kenneth L.; Tomback, Diana F.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Lance, Stacey L.; Streicher, Jeffrey W.; Smith, Eric N.; Pollock, David D.

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample – a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable.

  20. Rapid microsatellite identification from illumina paired-end genomic sequencing in two birds and a snake

    USGS Publications Warehouse

    Castoe, T.A.; Poole, A.W.; de Koning, A. P. J.; Jones, K.L.; Tomback, D.F.; Oyler-McCance, S.J.; Fike, J.A.; Lance, S.L.; Streicher, J.W.; Smith, E.N.; Pollock, D.D.

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample - a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable. ?? 2012 Castoe et al.

  1. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake.

    PubMed

    Castoe, Todd A; Poole, Alexander W; de Koning, A P Jason; Jones, Kenneth L; Tomback, Diana F; Oyler-McCance, Sara J; Fike, Jennifer A; Lance, Stacey L; Streicher, Jeffrey W; Smith, Eric N; Pollock, David D

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample--a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable.

  2. Development of Thinopyrum ponticum-specific molecular markers and FISH probes based on SLAF-seq technology.

    PubMed

    Liu, Liqin; Luo, Qiaoling; Teng, Wan; Li, Bin; Li, Hongwei; Li, Yiwen; Li, Zhensheng; Zheng, Qi

    2018-05-01

    Based on SLAF-seq, 67 Thinopyrum ponticum-specific markers and eight Th. ponticum-specific FISH probes were developed, and these markers and probes could be used for detection of alien chromatin in a wheat background. Decaploid Thinopyrum ponticum (2n = 10x = 70) is a valuable gene reservoir for wheat improvement. Identification of Th. ponticum introgression would facilitate its transfer into diverse wheat genetic backgrounds and its practical utilization in wheat improvement. Based on specific-locus-amplified fragment sequencing (SLAF-seq) technology, 67 new Th. ponticum-specific molecular markers and eight Th. ponticum-specific fluorescence in situ hybridization (FISH) probes have been developed from a tiny wheat-Th. ponticum translocation line. These newly developed molecular markers allowed the detection of Th. ponticum DNA in a variety of materials specifically and steadily at high throughput. According to the hybridization signal pattern, the eight Th. ponticum-specific probes could be divided into two groups. The first group including five dispersed repetitive sequence probes could identify Th. ponticum chromatin more sensitively and accurately than genomic in situ hybridization (GISH). Whereas the second group having three tandem repetitive sequence probes enabled the discrimination of Th. ponticum chromosomes together with another clone pAs1 in wheat-Th. ponticum partial amphiploid Xiaoyan 68.

  3. Genome-wide base-resolution mapping of DNA methylation in single cells using single-cell bisulfite sequencing (scBS-seq).

    PubMed

    Clark, Stephen J; Smallwood, Sébastien A; Lee, Heather J; Krueger, Felix; Reik, Wolf; Kelsey, Gavin

    2017-03-01

    DNA methylation (DNAme) is an important epigenetic mark in diverse species. Our current understanding of DNAme is based on measurements from bulk cell samples, which obscures intercellular differences and prevents analyses of rare cell types. Thus, the ability to measure DNAme in single cells has the potential to make important contributions to the understanding of several key biological processes, such as embryonic development, disease progression and aging. We have recently reported a method for generating genome-wide DNAme maps from single cells, using single-cell bisulfite sequencing (scBS-seq), allowing the quantitative measurement of DNAme at up to 50% of CpG dinucleotides throughout the mouse genome. Here we present a detailed protocol for scBS-seq that includes our most recent developments to optimize recovery of CpGs, mapping efficiency and success rate; reduce hands-on time; and increase sample throughput with the option of using an automated liquid handler. We provide step-by-step instructions for each stage of the method, comprising cell lysis and bisulfite (BS) conversion, preamplification and adaptor tagging, library amplification, sequencing and, lastly, alignment and methylation calling. An individual with relevant molecular biology expertise can complete library preparation within 3 d. Subsequent computational steps require 1-3 d for someone with bioinformatics expertise.

  4. Transcriptome analysis by strand-specific sequencing of complementary DNA

    PubMed Central

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-01-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212

  5. Transcriptome analysis by strand-specific sequencing of complementary DNA.

    PubMed

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-10-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.

  6. NG6: Integrated next generation sequencing storage and processing environment.

    PubMed

    Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

    2012-09-09

    Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.

  7. Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.

    PubMed

    Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N

    2004-01-01

    Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.

  8. ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data.

    PubMed

    Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu

    2013-01-01

    Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.

  9. Role of APOE Isoforms in the Pathogenesis of TBI Induced Alzheimer’s Disease

    DTIC Science & Technology

    2015-10-01

    global deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel ...ATP binding cassette transporter A1 (ABCA1) is a lipid transporter that controls the generation of HDL in plasma and ApoE-containing lipoproteins in... parallel sequencing, mRNA-seq, behavioral testing, mem- ory impairement, recovery. 3 Overall Project Summary During the reported period, we have been able

  10. Use of the melting curve assay as a means for high-throughput quantification of Illumina sequencing libraries.

    PubMed

    Shinozuka, Hiroshi; Forster, John W

    2016-01-01

    Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX Connect(TM) Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R (2) > 0.77), and that observed from MiSeq sequencing (R (2) = 0.82). Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.

  11. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    PubMed

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  12. ChIP-PIT: Enhancing the Analysis of ChIP-Seq Data Using Convex-Relaxed Pair-Wise Interaction Tensor Decomposition.

    PubMed

    Zhu, Lin; Guo, Wei-Li; Deng, Su-Ping; Huang, De-Shuang

    2016-01-01

    In recent years, thanks to the efforts of individual scientists and research consortiums, a huge amount of chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experimental data have been accumulated. Instead of investigating them independently, several recent studies have convincingly demonstrated that a wealth of scientific insights can be gained by integrative analysis of these ChIP-seq data. However, when used for the purpose of integrative analysis, a serious drawback of current ChIP-seq technique is that it is still expensive and time-consuming to generate ChIP-seq datasets of high standard. Most researchers are therefore unable to obtain complete ChIP-seq data for several TFs in a wide variety of cell lines, which considerably limits the understanding of transcriptional regulation pattern. In this paper, we propose a novel method called ChIP-PIT to overcome the aforementioned limitation. In ChIP-PIT, ChIP-seq data corresponding to a diverse collection of cell types, TFs and genes are fused together using the three-mode pair-wise interaction tensor (PIT) model, and the prediction of unperformed ChIP-seq experimental results is formulated as a tensor completion problem. Computationally, we propose efficient first-order method based on extensions of coordinate descent method to learn the optimal solution of ChIP-PIT, which makes it particularly suitable for the analysis of massive scale ChIP-seq data. Experimental evaluation the ENCODE data illustrate the usefulness of the proposed model.

  13. KatharoSeq Enables High-Throughput Microbiome Analysis from Low-Biomass Samples

    PubMed Central

    Minich, Jeremiah J.; Zhu, Qiyun; Janssen, Stefan; Hendrickson, Ryan; Amir, Amnon; Vetter, Russ; Hyde, John; Doty, Megan M.; Stillwell, Kristina; Benardini, James; Kim, Jae H.; Allen, Eric E.

    2018-01-01

    ABSTRACT Microbiome analyses of low-biomass samples are challenging because of contamination and inefficiencies, leading many investigators to employ low-throughput methods with minimal controls. We developed a new automated protocol, KatharoSeq (from the Greek katharos [clean]), that outperforms single-tube extractions while processing at least five times as fast. KatharoSeq incorporates positive and negative controls to reveal the whole bacterial community from inputs of as few as 50 cells and correctly identifies 90.6% (standard error, 0.013%) of the reads from 500 cells. To demonstrate the broad utility of KatharoSeq, we performed 16S rRNA amplicon and shotgun metagenome analyses of the Jet Propulsion Laboratory spacecraft assembly facility (SAF; n = 192, 96), 52 rooms of a neonatal intensive care unit (NICU; n = 388, 337), and an endangered-abalone-rearing facility (n = 192, 123), obtaining spatially resolved, unique microbiomes reproducible across hundreds of samples. The SAF, our primary focus, contains 32 sOTUs (sub-OTUs, defined as exact sequence matches) and their inferred variants identified by the deblur algorithm, with four (Acinetobacter lwoffii, Paracoccus marcusii, Mycobacterium sp., and Novosphingobium) being present in >75% of the samples. According to microbial spatial topography, the most abundant cleanroom contaminant, A. lwoffii, is related to human foot traffic exposure. In the NICU, we have been able to discriminate environmental exposure related to patient infectious disease, and in the abalone facility, we show that microbial communities reflect the marine environment rather than human input. Consequently, we demonstrate the feasibility and utility of large-scale, low-biomass metagenomic analyses using the KatharoSeq protocol. IMPORTANCE Various indoor, outdoor, and host-associated environments contain small quantities of microbial biomass and represent a niche that is often understudied because of technical constraints. Many studies that attempt to evaluate these low-biomass microbiome samples are riddled with erroneous results that are typically false positive signals obtained during the sampling process. We have investigated various low-biomass kits and methods to determine the limit of detection of these pipelines. Here we present KatharoSeq, a high-throughput protocol combining laboratory and bioinformatic methods that can differentiate a true positive signal in samples with as few as 50 to 500 cells. We demonstrate the application of this method in three unique low-biomass environments, including a SAF, a hospital NICU, and an abalone-rearing facility. PMID:29577086

  14. Evaluation of microRNA alignment techniques

    PubMed Central

    Kaspi, Antony; El-Osta, Assam

    2016-01-01

    Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164

  15. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data.

    PubMed

    Gong, Ting; Szustakowski, Joseph D

    2013-04-15

    For heterogeneous tissues, measurements of gene expression through mRNA-Seq data are confounded by relative proportions of cell types involved. In this note, we introduce an efficient pipeline: DeconRNASeq, an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It adopts a globally optimized non-negative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next-generation sequencing data. We demonstrated the feasibility and validity of DeconRNASeq across a range of mixing levels and sources using mRNA-Seq data mixed in silico at known concentrations. We validated our computational approach for various benchmark data, with high correlation between our predicted cell proportions and the real fractions of tissues. Our study provides a rigorous, quantitative and high-resolution tool as a prerequisite to use mRNA-Seq data. The modularity of package design allows an easy deployment of custom analytical pipelines for data from other high-throughput platforms. DeconRNASeq is written in R, and is freely available at http://bioconductor.org/packages. Supplementary data are available at Bioinformatics online.

  16. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research

    PubMed Central

    Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-01-01

    Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919

  17. RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.

    PubMed

    Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno

    2015-05-01

    RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  18. Investigating the Responses of Cronobacter sakazakii to Garlic-Drived Organosulfur Compounds: a Systematic Study of Pathogenic-Bacterium Injury by Use of High-Throughput Whole-Transcriptome Sequencing and Confocal Micro-Raman Spectroscopy

    PubMed Central

    Feng, Shaolong; Eucker, Tyson P.; Holly, Mayumi K.; Konkel, Michael E.

    2014-01-01

    We present the results of a study using high-throughput whole-transcriptome sequencing (RNA-seq) and vibrational spectroscopy to characterize and fingerprint pathogenic-bacterium injury under conditions of unfavorable stress. Two garlic-derived organosulfur compounds were found to be highly effective antimicrobial compounds against Cronobacter sakazakii, a leading pathogen associated with invasive infection of infants and causing meningitis, necrotizing entercolitis, and bacteremia. RNA-seq shows changes in gene expression patterns and transcriptomic response, while confocal micro-Raman spectroscopy characterizes macromolecular changes in the bacterial cell resulting from this chemical stress. RNA-seq analyses showed that the bacterial response to ajoene differed from the response to diallyl sulfide. Specifically, ajoene caused downregulation of motility-related genes, while diallyl sulfide treatment caused an increased expression of cell wall synthesis genes. Confocal micro-Raman spectroscopy revealed that the two compounds appear to have the same phase I antimicrobial mechanism of binding to thiol-containing proteins/enzymes in bacterial cells generating a disulfide stretching band but different phase II antimicrobial mechanisms, showing alterations in the secondary structures of proteins in two different ways. Diallyl sulfide primarily altered the α-helix and β-sheet, as reflected in changes in amide I, while ajoene altered the structures containing phenylalanine and tyrosine. Bayesian probability analysis validated the ability of principal component analysis to differentiate treated and control C. sakazakii cells. Scanning electron microscopy confirmed cell injury, showing significant morphological variations in cells following treatments by these two compounds. Findings from this study aid in the development of effective intervention strategies to reduce the risk of C. sakazakii contamination in the food production environment and on food contact surfaces, reducing the risks to susceptible consumers. PMID:24271174

  19. Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol

    PubMed Central

    2017-01-01

    Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence–function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence–function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the −35 RNA polymerase (RNAP) binding site and one overlapping with the −10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli. PMID:28463494

  20. High-throughput automated microfluidic sample preparation for accurate microbial genomics

    PubMed Central

    Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B.; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P.; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C.

    2017-01-01

    Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications. PMID:28128213

  1. FibroChip, a Functional DNA Microarray to Monitor Cellulolytic and Hemicellulolytic Activities of Rumen Microbiota

    PubMed Central

    Comtet-Marre, Sophie; Chaucheyras-Durand, Frédérique; Bouzid, Ourdia; Mosoni, Pascale; Bayat, Ali R.; Peyret, Pierre; Forano, Evelyne

    2018-01-01

    Ruminants fulfill their energy needs for growth primarily through microbial breakdown of plant biomass in the rumen. Several biotic and abiotic factors influence the efficiency of fiber degradation, which can ultimately impact animal productivity and health. To provide more insight into mechanisms involved in the modulation of fibrolytic activity, a functional DNA microarray targeting genes encoding key enzymes involved in cellulose and hemicellulose degradation by rumen microbiota was designed. Eight carbohydrate-active enzyme (CAZyme) families (GH5, GH9, GH10, GH11, GH43, GH48, CE1, and CE6) were selected which represented 392 genes from bacteria, protozoa, and fungi. The DNA microarray, designated as FibroChip, was validated using targets of increasing complexity and demonstrated sensitivity and specificity. In addition, FibroChip was evaluated for its explorative and semi-quantitative potential. Differential expression of CAZyme genes was evidenced in the rumen bacterium Fibrobacter succinogenes S85 grown on wheat straw or cellobiose. FibroChip was used to identify the expressed CAZyme genes from the targeted families in the rumen of a cow fed a mixed diet based on grass silage. Among expressed genes, those encoding GH43, GH5, and GH10 families were the most represented. Most of the F. succinogenes genes detected by the FibroChip were also detected following RNA-seq analysis of RNA transcripts obtained from the rumen fluid sample. Use of the FibroChip also indicated that transcripts of fiber degrading enzymes derived from eukaryotes (protozoa and anaerobic fungi) represented a significant proportion of the total microbial mRNA pool. FibroChip represents a reliable and high-throughput tool that enables researchers to monitor active members of fiber degradation in the rumen. PMID:29487591

  2. FibroChip, a Functional DNA Microarray to Monitor Cellulolytic and Hemicellulolytic Activities of Rumen Microbiota.

    PubMed

    Comtet-Marre, Sophie; Chaucheyras-Durand, Frédérique; Bouzid, Ourdia; Mosoni, Pascale; Bayat, Ali R; Peyret, Pierre; Forano, Evelyne

    2018-01-01

    Ruminants fulfill their energy needs for growth primarily through microbial breakdown of plant biomass in the rumen. Several biotic and abiotic factors influence the efficiency of fiber degradation, which can ultimately impact animal productivity and health. To provide more insight into mechanisms involved in the modulation of fibrolytic activity, a functional DNA microarray targeting genes encoding key enzymes involved in cellulose and hemicellulose degradation by rumen microbiota was designed. Eight carbohydrate-active enzyme (CAZyme) families (GH5, GH9, GH10, GH11, GH43, GH48, CE1, and CE6) were selected which represented 392 genes from bacteria, protozoa, and fungi. The DNA microarray, designated as FibroChip, was validated using targets of increasing complexity and demonstrated sensitivity and specificity. In addition, FibroChip was evaluated for its explorative and semi-quantitative potential. Differential expression of CAZyme genes was evidenced in the rumen bacterium Fibrobacter succinogenes S85 grown on wheat straw or cellobiose. FibroChip was used to identify the expressed CAZyme genes from the targeted families in the rumen of a cow fed a mixed diet based on grass silage. Among expressed genes, those encoding GH43, GH5, and GH10 families were the most represented. Most of the F. succinogenes genes detected by the FibroChip were also detected following RNA-seq analysis of RNA transcripts obtained from the rumen fluid sample. Use of the FibroChip also indicated that transcripts of fiber degrading enzymes derived from eukaryotes (protozoa and anaerobic fungi) represented a significant proportion of the total microbial mRNA pool. FibroChip represents a reliable and high-throughput tool that enables researchers to monitor active members of fiber degradation in the rumen.

  3. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

    PubMed Central

    2012-01-01

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019

  4. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

    PubMed

    Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

    2012-09-17

    RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

  5. Integrated digital error suppression for improved detection of circulating tumor DNA

    PubMed Central

    Kurtz, David M.; Chabon, Jacob J.; Scherer, Florian; Stehr, Henning; Liu, Chih Long; Bratman, Scott V.; Say, Carmen; Zhou, Li; Carter, Justin N.; West, Robert B.; Sledge, George W.; Shrager, Joseph B.; Loo, Billy W.; Neal, Joel W.; Wakelee, Heather A.; Diehn, Maximilian; Alizadeh, Ash A.

    2016-01-01

    High-throughput sequencing of circulating tumor DNA (ctDNA) promises to facilitate personalized cancer therapy. However, low quantities of cell-free DNA (cfDNA) in the blood and sequencing artifacts currently limit analytical sensitivity. To overcome these limitations, we introduce an approach for integrated digital error suppression (iDES). Our method combines in silico elimination of highly stereotypical background artifacts with a molecular barcoding strategy for the efficient recovery of cfDNA molecules. Individually, these two methods each improve the sensitivity of cancer personalized profiling by deep sequencing (CAPP-Seq) by ~3 fold, and synergize when combined to yield ~15-fold improvements. As a result, iDES-enhanced CAPP-Seq facilitates noninvasive variant detection across hundreds of kilobases. Applied to clinical non-small cell lung cancer (NSCLC) samples, our method enabled biopsy-free profiling of EGFR kinase domain mutations with 92% sensitivity and 96% specificity and detection of ctDNA down to 4 in 105 cfDNA molecules. We anticipate that iDES will aid the noninvasive genotyping and detection of ctDNA in research and clinical settings. PMID:27018799

  6. Exome Pool-Seq in neurodevelopmental disorders.

    PubMed

    Popp, Bernt; Ekici, Arif B; Thiel, Christian T; Hoyer, Juliane; Wiesener, Antje; Kraus, Cornelia; Reis, André; Zweier, Christiane

    2017-12-01

    High throughput sequencing has greatly advanced disease gene identification, especially in heterogeneous entities. Despite falling costs this is still an expensive and laborious technique, particularly when studying large cohorts. To address this problem we applied Exome Pool-Seq as an economic and fast screening technology in neurodevelopmental disorders (NDDs). Sequencing of 96 individuals can be performed in eight pools of 12 samples on less than one Illumina sequencer lane. In a pilot study with 96 cases we identified 27 variants, likely or possibly affecting function. Twenty five of these were identified in 923 established NDD genes (based on SysID database, status November 2016) (ACTB, AHDC1, ANKRD11, ATP6V1B2, ATRX, CASK, CHD8, GNAS, IFIH1, KCNQ2, KMT2A, KRAS, MAOA, MED12, MED13L, RIT1, SETD5, SIN3A, TCF4, TRAPPC11, TUBA1A, WAC, ZBTB18, ZMYND11), two in 543 (SysID) candidate genes (ZNF292, BPTF), and additionally a de novo loss-of-function variant in LRRC7, not previously implicated in NDDs. Most of them were confirmed to be de novo, but we also identified X-linked or autosomal-dominantly or autosomal-recessively inherited variants. With a detection rate of 28%, Exome Pool-Seq achieves comparable results to individual exome analyses but reduces costs by >85%. Compared with other large scale approaches using Molecular Inversion Probes (MIP) or gene panels, it allows flexible re-analysis of data. Exome Pool-Seq is thus well suited for large-scale, cost-efficient and flexible screening in characterized but heterogeneous entities like NDDs.

  7. Mapping RNA Structure In Vitro with SHAPE Chemistry and Next-Generation Sequencing (SHAPE-Seq).

    PubMed

    Watters, Kyle E; Lucks, Julius B

    2016-01-01

    Mapping RNA structure with selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistry has proven to be a versatile method for characterizing RNA structure in a variety of contexts. SHAPE reagents covalently modify RNAs in a structure-dependent manner to create adducts at the 2'-OH group of the ribose backbone at nucleotides that are structurally flexible. The positions of these adducts are detected using reverse transcriptase (RT) primer extension, which stops one nucleotide before the modification, to create a pool of cDNAs whose lengths reflect the location of SHAPE modification. Quantification of the cDNA pools is used to estimate the "reactivity" of each nucleotide in an RNA molecule to the SHAPE reagent. High reactivities indicate nucleotides that are structurally flexible, while low reactivities indicate nucleotides that are inflexible. These SHAPE reactivities can then be used to infer RNA structures by restraining RNA structure prediction algorithms. Here, we provide a state-of-the-art protocol describing how to perform in vitro RNA structure probing with SHAPE chemistry using next-generation sequencing to quantify cDNA pools and estimate reactivities (SHAPE-Seq). The use of next-generation sequencing allows for higher throughput, more consistent data analysis, and multiplexing capabilities. The technique described herein, SHAPE-Seq v2.0, uses a universal reverse transcription priming site that is ligated to the RNA after SHAPE modification. The introduced priming site allows for the structural analysis of an RNA independent of its sequence.

  8. Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing.

    PubMed

    Crampton, Mollee; Sripathi, Venkateswara R; Hossain, Khwaja; Kalavacharla, Venu

    2016-01-01

    Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar ("Sierra") using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation.

  9. Analyses of Methylomes Derived from Meso-American Common Bean (Phaseolus vulgaris L.) Using MeDIP-Seq and Whole Genome Sodium Bisulfite-Sequencing

    PubMed Central

    Crampton, Mollee; Sripathi, Venkateswara R.; Hossain, Khwaja; Kalavacharla, Venu

    2016-01-01

    Common bean (Phaseolus vulgaris L.) is economically important for its high protein, fiber, and micronutrient contents, with a relatively small genome size of ∼587 Mb. Common bean is genetically diverse with two major gene pools, Meso-American and Andean. The phenotypic variability within common bean is partly attributed to the genetic diversity and epigenetic changes that are largely influenced by environmental factors. It is well established that an important epigenetic regulator of gene expression is DNA methylation. Here, we present results generated from two high-throughput sequencing technologies, methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and whole genome bisulfite-sequencing (BS-Seq). Our analyses revealed that this Meso-American common bean displays similar methylation patterns as other previously published plant methylomes, with CG ∼50%, CHG ∼30%, and CHH ∼2.7% methylation, however, these differ from the common bean reference methylome of Andean origin. We identified higher CG methylation levels in both promoter and genic regions than CHG and CHH contexts. Moreover, we found relatively higher CG methylation levels in genes than in promoters. Conversely, the CHG and CHH methylation levels were highest in promoters than in genes. This is the first genome-wide DNA methylation profiling study in a Meso-American common bean cultivar (“Sierra”) using NGS approaches. Our long-term goal is to generate genome-wide epigenomic maps in common bean focusing on chromatin accessibility, histone modifications, and DNA methylation. PMID:27199997

  10. Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.

    PubMed

    Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin

    2018-02-13

    Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.

  11. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA

    PubMed Central

    Wright, Imogen A.; Travers, Simon A.

    2014-01-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618

  12. The future scalability of pH-based genome sequencers: A theoretical perspective

    NASA Astrophysics Data System (ADS)

    Go, Jonghyun; Alam, Muhammad A.

    2013-10-01

    Sequencing of human genome is an essential prerequisite for personalized medicine and early prognosis of various genetic diseases. The state-of-art, high-throughput genome sequencing technologies provide improved sequencing; however, their reliance on relatively expensive optical detection schemes has prevented wide-spread adoption of the technology in routine care. In contrast, the recently announced pH-based electronic genome sequencers achieve fast sequencing at low cost because of the compatibility with the current microelectronics technology. While the progress in technology development has been rapid, the physics of the sequencing chips and the potential for future scaling (and therefore, cost reduction) remain unexplored. In this article, we develop a theoretical framework and a scaling theory to explain the principle of operation of the pH-based sequencing chips and use the framework to explore various perceived scaling limits of the technology related to signal to noise ratio, well-to-well crosstalk, and sequencing accuracy. We also address several limitations inherent to the key steps of pH-based genome sequencers, which are widely shared by many other sequencing platforms in the market but remained unexplained properly so far.

  13. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    PubMed

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  14. Nanoscale zero-valent iron/persulfate enhanced upflow anaerobic sludge blanket reactor for dye removal: Insight into microbial metabolism and microbial community

    PubMed Central

    Pan, Fei; Zhong, Xiaohan; Xia, Dongsheng; Yin, Xianze; Li, Fan; Zhao, Dongye; Ji, Haodong; Liu, Wen

    2017-01-01

    This study investigated the efficiency of nanoscale zero-valent iron combined with persulfate (NZVI/PS) for enhanced degradation of brilliant red X-3B in an upflow anaerobic sludge blanket (UASB) reactor, and examined the effects of NZVI/PS on anaerobic microbial communities during the treatment process. The addition of NZVI (0.5 g/L) greatly enhanced the decolourization rate of X-3B from 63.8% to 98.4%. The Biolog EcoPlateTM technique was utilized to examine microbial metabolism in the reactor, and the Illumina MiSeq high-throughput sequencing revealed 22 phyla and 88 genera of the bacteria. The largest genera (Lactococcus) decreased from 33.03% to 7.94%, while the Akkermansia genera increased from 1.69% to 20.23% according to the abundance in the presence of 0.2 g/L NZVI during the biological treatment process. Meanwhile, three strains were isolated from the sludge in the UASB reactors and identified by 16 S rRNA analysis. The distribution of three strains was consistent with the results from the Illumina MiSeq high throughput sequencing. The X-ray photoelectron spectroscopy results indicated that Fe(0) was transformed into Fe(II)/Fe(III) during the treatment process, which are beneficial for the microorganism growth, and thus promoting their metabolic processes and microbial community. PMID:28300176

  15. Nanoscale zero-valent iron/persulfate enhanced upflow anaerobic sludge blanket reactor for dye removal: Insight into microbial metabolism and microbial community

    NASA Astrophysics Data System (ADS)

    Pan, Fei; Zhong, Xiaohan; Xia, Dongsheng; Yin, Xianze; Li, Fan; Zhao, Dongye; Ji, Haodong; Liu, Wen

    2017-03-01

    This study investigated the efficiency of nanoscale zero-valent iron combined with persulfate (NZVI/PS) for enhanced degradation of brilliant red X-3B in an upflow anaerobic sludge blanket (UASB) reactor, and examined the effects of NZVI/PS on anaerobic microbial communities during the treatment process. The addition of NZVI (0.5 g/L) greatly enhanced the decolourization rate of X-3B from 63.8% to 98.4%. The Biolog EcoPlateTM technique was utilized to examine microbial metabolism in the reactor, and the Illumina MiSeq high-throughput sequencing revealed 22 phyla and 88 genera of the bacteria. The largest genera (Lactococcus) decreased from 33.03% to 7.94%, while the Akkermansia genera increased from 1.69% to 20.23% according to the abundance in the presence of 0.2 g/L NZVI during the biological treatment process. Meanwhile, three strains were isolated from the sludge in the UASB reactors and identified by 16 S rRNA analysis. The distribution of three strains was consistent with the results from the Illumina MiSeq high throughput sequencing. The X-ray photoelectron spectroscopy results indicated that Fe(0) was transformed into Fe(II)/Fe(III) during the treatment process, which are beneficial for the microorganism growth, and thus promoting their metabolic processes and microbial community.

  16. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  17. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  18. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  19. High-Throughput Next-Generation Sequencing of Polioviruses

    PubMed Central

    Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

    2016-01-01

    ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929

  20. Characterization of a new apple luteovirus identified by high-throughput sequencing.

    PubMed

    Liu, Huawei; Wu, Liping; Nikolaeva, Ekaterina; Peter, Kari; Liu, Zongrang; Mollov, Dimitre; Cao, Mengji; Li, Ruhui

    2018-05-15

    'Rapid Apple Decline' (RAD) is a newly emerging problem of young, dwarf apple trees in the Northeastern USA. The affected trees show trunk necrosis, cracking and canker before collapse in summer. In this study, we discovered and characterized a new luteovirus from apple trees in RAD-affected orchards using high-throughput sequencing (HTS) technology and subsequent Sanger sequencing. Illumina NextSeq sequencing was applied to total RNAs prepared from three diseased apple trees. Sequence reads were de novo assembled, and contigs were annotated by BLASTx. RT-PCR and 5'/3' RACE sequencing were used to obtain the complete genome of a new virus. RT-PCR was used to detect the virus. Three common apple viruses and a new luteovirus were identified from the diseased trees by HTS and RT-PCR. Sequence analyses of the complete genome of the new virus show that it is a new species of the genus Luteovirus in the family Luteoviridae. The virus is graft transmissible and detected by RT-PCR in apple trees in a couple of orchards. A new luteovirus and/or three known viruses were found to be associated with RAD. Molecular characterization of the new luteovirus provides important information for further investigation of its distribution and etiological role.

  1. Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

    2016-03-01

    Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.

  2. Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

    2018-03-20

    The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2  = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2  = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

  3. High throughput 16SrRNA gene sequencing reveals the correlation between Propionibacterium acnes and sarcoidosis.

    PubMed

    Zhao, Meng-Meng; Du, Shan-Shan; Li, Qiu-Hong; Chen, Tao; Qiu, Hui; Wu, Qin; Chen, Shan-Shan; Zhou, Ying; Zhang, Yuan; Hu, Yang; Su, Yi-Liang; Shen, Li; Zhang, Fen; Weng, Dong; Li, Hui-Ping

    2017-02-01

    This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development.

  4. Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study

    PubMed Central

    Roychowdhury, Sameek; Iyer, Matthew K.; Robinson, Dan R.; Lonigro, Robert J.; Wu, Yi-Mi; Cao, Xuhong; Kalyana-Sundaram, Shanker; Sam, Lee; Balbin, O. Alejandro; Quist, Michael J.; Barrette, Terrence; Everett, Jessica; Siddiqui, Javed; Kunju, Lakshmi P.; Navone, Nora; Araujo, John C.; Troncoso, Patricia; Logothetis, Christopher J.; Innis, Jeffrey W.; Smith, David C.; Lao, Christopher D.; Kim, Scott Y.; Roberts, J. Scott; Gruber, Stephen B.; Pienta, Kenneth J.; Talpaz, Moshe; Chinnaiyan, Arul M.

    2012-01-01

    Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology. PMID:22133722

  5. Applying Genomic and Genetic Tools to Understand and Mitigate Damage from Exposure to Toxins

    DTIC Science & Technology

    2013-10-01

    sequences to the human genome . Genome Biol 10, R25 (2009). 26 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand...utilizing the high-throughput technology of mRNA-seq. BODY The goal of our research program (W81XWH-09-1-0715) was to utilize genetic and genomic ...also acquired the achetf222a * * * * * 5 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand and Mitigate

  6. A hierarchical model for clustering m(6)A methylation peaks in MeRIP-seq data.

    PubMed

    Cui, Xiaodong; Meng, Jia; Zhang, Shaowu; Rao, Manjeet K; Chen, Yidong; Huang, Yufei

    2016-08-22

    The recent advent of the state-of-art high throughput sequencing technology, known as Methylated RNA Immunoprecipitation combined with RNA sequencing (MeRIP-seq) revolutionizes the area of mRNA epigenetics and enables the biologists and biomedical researchers to have a global view of N (6)-Methyladenosine (m(6)A) on transcriptome. Yet there is a significant need for new computation tools for processing and analysing MeRIP-Seq data to gain a further insight into the function and m(6)A mRNA methylation. We developed a novel algorithm and an open source R package ( http://compgenomics.utsa.edu/metcluster ) for uncovering the potential types of m(6)A methylation by clustering the degree of m(6)A methylation peaks in MeRIP-Seq data. This algorithm utilizes a hierarchical graphical model to model the reads account variance and the underlying clusters of the methylation peaks. Rigorous statistical inference is performed to estimate the model parameter and detect the number of clusters. MeTCluster is evaluated on both simulated and real MeRIP-seq datasets and the results demonstrate its high accuracy in characterizing the clusters of methylation peaks. Our algorithm was applied to two different sets of real MeRIP-seq datasets and reveals a novel pattern that methylation peaks with less peak enrichment tend to clustered in the 5' end of both in both mRNAs and lncRNAs, whereas those with higher peak enrichment are more likely to be distributed in CDS and towards the 3'end of mRNAs and lncRNAs. This result might suggest that m(6)A's functions could be location specific. In this paper, a novel hierarchical graphical model based algorithm was developed for clustering the enrichment of methylation peaks in MeRIP-seq data. MeTCluster is written in R and is publicly available.

  7. Next-generation sequencing using a pre-designed gene panel for the molecular diagnosis of congenital disorders in pediatric patients.

    PubMed

    Lim, Eileen C P; Brett, Maggie; Lai, Angeline H M; Lee, Siew-Peng; Tan, Ee-Shien; Jamuar, Saumya S; Ng, Ivy S L; Tan, Ene-Choo

    2015-12-14

    Next-generation sequencing (NGS) has revolutionized genetic research and offers enormous potential for clinical application. Sequencing the exome has the advantage of casting the net wide for all known coding regions while targeted gene panel sequencing provides enhanced sequencing depths and can be designed to avoid incidental findings in adult-onset conditions. A HaloPlex panel consisting of 180 genes within commonly altered chromosomal regions is available for use on both the Ion Personal Genome Machine (PGM) and MiSeq platforms to screen for causative mutations in these genes. We used this Haloplex ICCG panel for targeted sequencing of 15 patients with clinical presentations indicative of an abnormality in one of the 180 genes. Sequencing runs were done using the Ion 318 Chips on the Ion Torrent PGM. Variants were filtered for known polymorphisms and analysis was done to identify possible disease-causing variants before validation by Sanger sequencing. When possible, segregation of variants with phenotype in family members was performed to ascertain the pathogenicity of the variant. More than 97% of the target bases were covered at >20×. There was an average of 9.6 novel variants per patient. Pathogenic mutations were identified in five genes for six patients, with two novel variants. There were another five likely pathogenic variants, some of which were unreported novel variants. In a cohort of 15 patients, we were able to identify a likely genetic etiology in six patients (40%). Another five patients had candidate variants for which further evaluation and segregation analysis are ongoing. Our results indicate that the HaloPlex ICCG panel is useful as a rapid, high-throughput and cost-effective screening tool for 170 of the 180 genes. There is low coverage for some regions in several genes which might have to be supplemented by Sanger sequencing. However, comparing the cost, ease of analysis, and shorter turnaround time, it is a good alternative to exome sequencing for patients whose features are suggestive of a genetic etiology involving one of the genes in the panel.

  8. Micro-chromatin Immunoprecipation (μChIP) Protocol for Real-time PCR Analysis of a Limited Amount of Cells.

    PubMed

    Gillotin, Sébastien; Guillemot, François

    2016-06-20

    Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) is an important strategy to study gene regulation. When availability of cells is limited, however, it can be useful to focus on specific genes to investigate in depth the role of transcription factors or histone marks. Unfortunately, performing ChIP experiments to study transcription factors' binding to DNA can be difficult when biological material is restricted. This protocol describes a robust method to perform μChIP for over-expressed or endogenous transcription factors using ~100,000 cells per ChIP experiment (Masserdotti et al ., 2015). We also describe optimization steps, which we think are critical for this protocol to work and which can be used to further reduce the number of cells.

  9. Accuracy of the high-throughput amplicon sequencing to identify species within the genus Aspergillus.

    PubMed

    Lee, Seungeun; Yamamoto, Naomichi

    2015-12-01

    This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.

  10. High-throughput sequencing of microbial community diversity in soil, grapes, leaves, grape juice and wine of grapevine from China.

    PubMed

    Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye

    2018-01-01

    In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.

  11. High-throughput sequencing of microbial community diversity in soil, grapes, leaves, grape juice and wine of grapevine from China

    PubMed Central

    Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye

    2018-01-01

    In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999

  12. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology.

    PubMed

    Mykles, Donald L; Burnett, Karen G; Durica, David S; Joyce, Blake L; McCarthy, Fiona M; Schmidt, Carl J; Stillman, Jonathon H

    2016-12-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the "Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology" symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  13. Highly sensitive and unbiased approach for elucidating antibody repertoires

    PubMed Central

    Lin, Sherry G.; Ba, Zhaoqing; Du, Zhou; Zhang, Yu; Hu, Jiazhi; Alt, Frederick W.

    2016-01-01

    Developing B lymphocytes undergo V(D)J recombination to assemble germ-line V, D, and J gene segments into exons that encode the antigen-binding variable region of Ig heavy (H) and light (L) chains. IgH and IgL chains associate to form the B-cell receptor (BCR), which, upon antigen binding, activates B cells to secrete BCR as an antibody. Each of the huge number of clonally independent B cells expresses a unique set of IgH and IgL variable regions. The ability of V(D)J recombination to generate vast primary B-cell repertoires results from a combinatorial assortment of large numbers of different V, D, and J segments, coupled with diversification of the junctions between them to generate the complementary determining region 3 (CDR3) for antigen contact. Approaches to evaluate in depth the content of primary antibody repertoires and, ultimately, to study how they are further molded by secondary mutation and affinity maturation processes are of great importance to the B-cell development, vaccine, and antibody fields. We now describe an unbiased, sensitive, and readily accessible assay, referred to as high-throughput genome-wide translocation sequencing-adapted repertoire sequencing (HTGTS-Rep-seq), to quantify antibody repertoires. HTGTS-Rep-seq quantitatively identifies the vast majority of IgH and IgL V(D)J exons, including their unique CDR3 sequences, from progenitor and mature mouse B lineage cells via the use of specific J primers. HTGTS-Rep-seq also accurately quantifies DJH intermediates and V(D)J exons in either productive or nonproductive configurations. HTGTS-Rep-seq should be useful for studies of human samples, including clonal B-cell expansions, and also for following antibody affinity maturation processes. PMID:27354528

  14. Is this the right normalization? A diagnostic tool for ChIP-seq normalization.

    PubMed

    Angelini, Claudia; Heller, Ruth; Volkinshtein, Rita; Yekutieli, Daniel

    2015-05-09

    Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis.

  15. Global Mapping of Cell Type–Specific Open Chromatin by FAIRE-seq Reveals the Regulatory Role of the NFI Family in Adipocyte Differentiation

    PubMed Central

    Yu, Jing; Hirose-Yotsuya, Lisa; Take, Kazumi; Sun, Wei; Iwabu, Masato; Okada-Iwabu, Miki; Fujita, Takanori; Aoyama, Tomohisa; Tsutsumi, Shuichi; Ueki, Kohjiro; Kodama, Tatsuhiko; Sakai, Juro; Aburatani, Hiroyuki; Kadowaki, Takashi

    2011-01-01

    Identification of regulatory elements within the genome is crucial for understanding the mechanisms that govern cell type–specific gene expression. We generated genome-wide maps of open chromatin sites in 3T3-L1 adipocytes (on day 0 and day 8 of differentiation) and NIH-3T3 fibroblasts using formaldehyde-assisted isolation of regulatory elements coupled with high-throughput sequencing (FAIRE-seq). FAIRE peaks at the promoter were associated with active transcription and histone modifications of H3K4me3 and H3K27ac. Non-promoter FAIRE peaks were characterized by H3K4me1+/me3-, the signature of enhancers, and were largely located in distal regions. The non-promoter FAIRE peaks showed dynamic change during differentiation, while the promoter FAIRE peaks were relatively constant. Functionally, the adipocyte- and preadipocyte-specific non-promoter FAIRE peaks were, respectively, associated with genes up-regulated and down-regulated by differentiation. Genes highly up-regulated during differentiation were associated with multiple clustered adipocyte-specific FAIRE peaks. Among the adipocyte-specific FAIRE peaks, 45.3% and 11.7% overlapped binding sites for, respectively, PPARγ and C/EBPα, the master regulators of adipocyte differentiation. Computational motif analyses of the adipocyte-specific FAIRE peaks revealed enrichment of a binding motif for nuclear family I (NFI) transcription factors. Indeed, ChIP assay showed that NFI occupy the adipocyte-specific FAIRE peaks and/or the PPARγ binding sites near PPARγ, C/EBPα, and aP2 genes. Overexpression of NFIA in 3T3-L1 cells resulted in robust induction of these genes and lipid droplet formation without differentiation stimulus. Overexpression of dominant-negative NFIA or siRNA–mediated knockdown of NFIA or NFIB significantly suppressed both induction of genes and lipid accumulation during differentiation, suggesting a physiological function of these factors in the adipogenic program. Together, our study demonstrates the utility of FAIRE-seq in providing a global view of cell type–specific regulatory elements in the genome and in identifying transcriptional regulators of adipocyte differentiation. PMID:22028663

  16. High-throughput sequencing reveals unprecedented diversities of Aspergillus species in outdoor air.

    PubMed

    Lee, S; An, C; Xu, S; Lee, S; Yamamoto, N

    2016-09-01

    This study used the Illumina MiSeq to analyse compositions and diversities of Aspergillus species in outdoor air. The seasonal air samplings were performed at two locations in Seoul, South Korea. The results showed the relative abundances of all Aspergillus species combined ranging from 0·20 to 18% and from 0·19 to 21% based on the number of the internal transcribed spacer 1 (ITS1) and β-tubulin (BenA) gene sequences respectively. Aspergillus fumigatus was the most dominant species with the mean relative abundances of 1·2 and 5·5% based on the number of the ITS1 and BenA sequences respectively. A total of 29 Aspergillus species were detected and identified down to the species rank, among which nine species were known opportunistic pathogens. Remarkably, eight of the nine pathogenic species were detected by either one of the two markers, suggesting the need of using multiple markers and/or primer pairs when the assessments are made based on the high-throughput sequencing. Due to diversity of species within the genus Aspergillus, the high-throughput sequencing was useful to characterize their compositions and diversities in outdoor air, which are thought to be difficult to be accurately characterized by conventional culture and/or Sanger sequencing-based techniques. Aspergillus is a diverse genus of fungi with more than 300 species reported in literature. Aspergillus is important since some species are known allergens and opportunistic human pathogens. Traditionally, growth-dependent methods have been used to detect Aspergillus species in air. However, these methods are limited in the number of isolates that can be analysed for their identities, resulting in inaccurate characterizations of Aspergillus diversities. This study used the high-throughput sequencing to explore Aspergillus diversities in outdoor, which are thought to be difficult to be accurately characterized by traditional growth-dependent techniques. © 2016 The Society for Applied Microbiology.

  17. High-Throughput Parallel Sequencing to Measure Fitness of Leptospira interrogans Transposon Insertion Mutants during Acute Infection

    PubMed Central

    Matsunaga, James; Haake, David A.

    2016-01-01

    Pathogenic species of Leptospira are the causative agents of leptospirosis, a zoonotic disease that causes mortality and morbidity worldwide. The understanding of the virulence mechanisms of Leptospira spp is still at an early stage due to the limited number of genetic tools available for this microorganism. The development of random transposon mutagenesis in pathogenic strains a decade ago has contributed to the identification of several virulence factors. In this study, we used the transposon sequencing (Tn-Seq) technique, which combines transposon mutagenesis with massive parallel sequencing, to study the in vivo fitness of a pool of Leptospira interrogans mutants. We infected hamsters with a pool of 42 mutants (input pool), which included control mutants with insertions in four genes previously analyzed by virulence testing (loa22, ligB, flaA1, and lic20111) and 23 mutants with disrupted signal transduction genes. We quantified the mutants in different tissues (blood, kidney and liver) at 4 days post-challenge by high-throughput sequencing and compared the frequencies of mutants recovered from tissues to their frequencies in the input pool. Control mutants that were less fit in the Tn-Seq experiment were attenuated for virulence when tested separately in the hamster model of lethal leptospirosis. Control mutants with unaltered fitness were as virulent as the wild-type strain. We identified two mutants with the transposon inserted in the same putative adenylate/guanylate cyclase gene (lic12327) that had reduced in vivo fitness in blood, kidney and liver. Both lic12327 mutants were attenuated for virulence when tested individually in hamsters. Growth of the control mutants and lic12327 mutants in culture medium were similar to that of the wild-type strain. These results demonstrate the feasibility of screening large pools of L. interrogans transposon mutants for those with altered fitness, and potentially attenuated virulence, by transposon sequencing. PMID:27824878

  18. RAMICS: trainable, high-speed and biologically relevant alignment of high-throughput sequencing reads to coding DNA.

    PubMed

    Wright, Imogen A; Travers, Simon A

    2014-07-01

    The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms.

    PubMed

    Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M

    2017-12-06

    While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

  20. Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study

    PubMed Central

    Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger

    2017-01-01

    Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300

  1. Investigation of bacterial and archaeal communities: novel protocols using modern sequencing by Illumina MiSeq and traditional DGGE-cloning.

    PubMed

    Kraková, Lucia; Šoltys, Katarína; Budiš, Jaroslav; Grivalský, Tomáš; Ďuriš, František; Pangallo, Domenico; Szemes, Tomáš

    2016-09-01

    Different protocols based on Illumina high-throughput DNA sequencing and denaturing gradient gel electrophoresis (DGGE)-cloning were developed and applied for investigating hot spring related samples. The study was focused on three target genes: archaeal and bacterial 16S rRNA and mcrA of methanogenic microflora. Shorter read lengths of the currently most popular technology of sequencing by Illumina do not allow analysis of the complete 16S rRNA region, or of longer gene fragments, as was the case of Sanger sequencing. Here, we demonstrate that there is no need for special indexed or tailed primer sets dedicated to short variable regions of 16S rRNA since the presented approach allows the analysis of complete bacterial 16S rRNA amplicons (V1-V9) and longer archaeal 16S rRNA and mcrA sequences. Sample augmented with transposon is represented by a set of approximately 300 bp long fragments that can be easily sequenced by Illumina MiSeq. Furthermore, a low proportion of chimeric sequences was observed. DGGE-cloning based strategies were performed combining semi-nested PCR, DGGE and clone library construction. Comparing both investigation methods, a certain degree of complementarity was observed confirming that the DGGE-cloning approach is not obsolete. Novel protocols were created for several types of laboratories, utilizing the traditional DGGE technique or using the most modern Illumina sequencing.

  2. RNA-Seq reveals complex genetic response to Deepwater Horizon oil release in Fundulus grandis.

    PubMed

    Garcia, Tzintzuni I; Shen, Yingjia; Crawford, Douglas; Oleksiak, Marjorie F; Whitehead, Andrew; Walter, Ronald B

    2012-09-12

    The release of oil resulting from the blowout of the Deepwater Horizon (DH) drilling platform was one of the largest in history discharging more than 189 million gallons of oil and subject to widespread application of oil dispersants. This event impacted a wide range of ecological habitats with a complex mix of pollutants whose biological impact is still not yet fully understood. To better understand the effects on a vertebrate genome, we studied gene expression in the salt marsh minnow Fundulus grandis, which is local to the northern coast of the Gulf of Mexico and is a sister species of the ecotoxicological model Fundulus heteroclitus. To assess genomic changes, we quantified mRNA expression using high throughput sequencing technologies (RNA-Seq) in F. grandis populations in the marshes and estuaries impacted by DH oil release. This application of RNA-Seq to a non-model, wild, and ecologically significant organism is an important evaluation of the technology to quickly assess similar events in the future. Our de novo assembly of RNA-Seq data produced a large set of sequences which included many duplicates and fragments. In many cases several of these could be associated with a common reference sequence using blast to query a reference database. This reduced the set of significant genes to 1,070 down-regulated and 1,251 up-regulated genes. These genes indicate a broad and complex genomic response to DH oil exposure including the expected AHR-mediated response and CYP genes. In addition a response to hypoxic conditions and an immune response are also indicated. Several genes in the choriogenin family were down-regulated in the exposed group; a response that is consistent with AH exposure. These analyses are in agreement with oligonucleotide-based microarray analyses, and describe only a subset of significant genes with aberrant regulation in the exposed set. RNA-Seq may be successfully applied to feral and extremely polymorphic organisms that do not have an underlying genome sequence assembly to address timely environmental problems. Additionally, the observed changes in a large set of transcript expression levels are indicative of a complex response to the varied petroleum components to which the fish were exposed.

  3. Linkage maps of the Atlantic salmon (Salmo salar) genome derived from RAD sequencing

    PubMed Central

    2014-01-01

    Background Genetic linkage maps are useful tools for mapping quantitative trait loci (QTL) influencing variation in traits of interest in a population. Genotyping-by-sequencing approaches such as Restriction-site Associated DNA sequencing (RAD-Seq) now enable the rapid discovery and genotyping of genome-wide SNP markers suitable for the development of dense SNP linkage maps, including in non-model organisms such as Atlantic salmon (Salmo salar). This paper describes the development and characterisation of a high density SNP linkage map based on SbfI RAD-Seq SNP markers from two Atlantic salmon reference families. Results Approximately 6,000 SNPs were assigned to 29 linkage groups, utilising markers from known genomic locations as anchors. Linkage maps were then constructed for the four mapping parents separately. Overall map lengths were comparable between male and female parents, but the distribution of the SNPs showed sex-specific patterns with a greater degree of clustering of sire-segregating SNPs to single chromosome regions. The maps were integrated with the Atlantic salmon draft reference genome contigs, allowing the unique assignment of ~4,000 contigs to a linkage group. 112 genome contigs mapped to two or more linkage groups, highlighting regions of putative homeology within the salmon genome. A comparative genomics analysis with the stickleback reference genome identified putative genes closely linked to approximately half of the ordered SNPs and demonstrated blocks of orthology between the Atlantic salmon and stickleback genomes. A subset of 47 RAD-Seq SNPs were successfully validated using a high-throughput genotyping assay, with a correspondence of 97% between the two assays. Conclusions This Atlantic salmon RAD-Seq linkage map is a resource for salmonid genomics research as genotyping-by-sequencing becomes increasingly common. This is aided by the integration of the SbfI RAD-Seq SNPs with existing reference maps and the draft reference genome, as well as the identification of putative genes proximal to the SNPs. Differences in the distribution of recombination events between the sexes is evident, and regions of homeology have been identified which are reflective of the recent salmonid whole genome duplication. PMID:24571138

  4. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    PubMed Central

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818

  5. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    PubMed

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  6. Multiplex single-molecule interaction profiling of DNA-barcoded proteins.

    PubMed

    Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M

    2014-11-27

    In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.

  7. Multiplex single-molecule interaction profiling of DNA barcoded proteins

    PubMed Central

    Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.

    2014-01-01

    In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978

  8. Microbial Community in a Biofilter for Removal of Low Load Nitrobenzene Waste Gas

    PubMed Central

    Zhai, Jian; Wang, Zhu; Shi, Peng; Long, Chao

    2017-01-01

    To improve biofilter performance, the microbial community of a biofilter must be clearly defined. In this study, the performance of a lab-scale polyurethane biofilter for treating waste gas with low loads of nitrobenzene (NB) (< 20 g m-3 h-1) was investigated when using different empty bed residence times (EBRT) (64, 55.4 and 34 s, respectively). In addition, the variations of the bacterial community in the biofilm on the longitudinal distribution of the biofilters were analysed by using Illumina MiSeq high-throughput sequencing. The results showed that NB waste gas was successfully degraded in the biofilter. High-throughput sequencing data suggested that the phylum Actinobacteria and genus Rhodococcus played important roles in the degradation of NB. The variations of the microbial community were attributed to the different intermediate degradation products of NB in each layer. The strains identified in this study were potential candidates for purifying waste gas effluents containing NB. PMID:28114416

  9. A combined electron beam/optical lithography process step for the fabrication of sub-half-micron-gate-length MMIC chips

    NASA Technical Reports Server (NTRS)

    Sewell, James S.; Bozada, Christopher A.

    1994-01-01

    Advanced radar and communication systems rely heavily on state-of-the-art microelectronics. Systems such as the phased-array radar require many transmit/receive (T/R) modules which are made up of many millimeter wave - microwave integrated circuits (MMIC's). The heart of a MMIC chip is the Gallium Arsenide (GaAs) field-effect transistor (FET). The transistor gate length is the critical feature that determines the operating frequency of the radar system. A smaller gate length will typically result in a higher frequency. In order to make a phased array radar system economically feasible, manufacturers must be capable of producing very large quantities of small-gate-length MMIC chips at a relatively low cost per chip. This requires the processing of a large number of wafers with a large number of chips per wafer, minimum processing time, and a very high chip yield. One of the bottlenecks in the fabrication of MIMIC chips is the transistor gate definition. The definition of sub-half-micron gates for GaAs-based field-effect transistors is generally performed by direct-write electron beam lithography (EBL). Because of the throughput limitations of EBL, the gate-layer fabrication is conventionally divided into two lithographic processes where EBL is used to generate the gate fingers and optical lithography is used to generate the large-area gate pads and interconnects. As a result, two complete sequences of resist application, exposure, development, metallization and lift-off are required for the entire gate structure. We have baselined a hybrid process, referred to as EBOL (electron beam/optical lithography), in which a single application of a multi-level resist is used for both exposures. The entire gate structure, (gate fingers, interconnects and pads), is then formed with a single metallization and lift-off process. The EBOL process thus retains the advantages of the high-resolution E-beam lithography and the high throughput of optical lithography while essentially eliminating an entire lithography/metallization/lift-off process sequence. This technique has been proven to be reliable for both trapezoidal and mushroom gates and has been successfully applied to metal-semiconductor and high-electron-mobility field-effect transistor (MESFET and HEMT) wafers containing devices with gate lengths down to 0.10 micron and 75 x 75 micron gate pads. The yields and throughput of these wafers have been very high with no loss in device performance. We will discuss the entire EBOL process technology including the multilayer resist structure, exposure conditions, process sensitivities, metal edge definition, device results, comparison to the standard gate-layer process, and its suitability for manufacturing.

  10. A combined electron beam/optical lithography process step for the fabrication of sub-half-micron-gate-length MMIC chips

    NASA Astrophysics Data System (ADS)

    Sewell, James S.; Bozada, Christopher A.

    1994-02-01

    Advanced radar and communication systems rely heavily on state-of-the-art microelectronics. Systems such as the phased-array radar require many transmit/receive (T/R) modules which are made up of many millimeter wave - microwave integrated circuits (MMIC's). The heart of a MMIC chip is the Gallium Arsenide (GaAs) field-effect transistor (FET). The transistor gate length is the critical feature that determines the operating frequency of the radar system. A smaller gate length will typically result in a higher frequency. In order to make a phased array radar system economically feasible, manufacturers must be capable of producing very large quantities of small-gate-length MMIC chips at a relatively low cost per chip. This requires the processing of a large number of wafers with a large number of chips per wafer, minimum processing time, and a very high chip yield. One of the bottlenecks in the fabrication of MIMIC chips is the transistor gate definition. The definition of sub-half-micron gates for GaAs-based field-effect transistors is generally performed by direct-write electron beam lithography (EBL). Because of the throughput limitations of EBL, the gate-layer fabrication is conventionally divided into two lithographic processes where EBL is used to generate the gate fingers and optical lithography is used to generate the large-area gate pads and interconnects. As a result, two complete sequences of resist application, exposure, development, metallization and lift-off are required for the entire gate structure. We have baselined a hybrid process, referred to as EBOL (electron beam/optical lithography), in which a single application of a multi-level resist is used for both exposures. The entire gate structure, (gate fingers, interconnects and pads), is then formed with a single metallization and lift-off process. The EBOL process thus retains the advantages of the high-resolution E-beam lithography and the high throughput of optical lithography while essentially eliminating an entire lithography/metallization/lift-off process sequence. This technique has been proven to be reliable for both trapezoidal and mushroom gates and has been successfully applied to metal-semiconductor and high-electron-mobility field-effect transistor (MESFET and HEMT) wafers containing devices with gate lengths down to 0.10 micron and 75 x 75 micron gate pads. The yields and throughput of these wafers have been very high with no loss in device performance. We will discuss the entire EBOL process technology including the multilayer resist structure, exposure conditions, process sensitivities, metal edge definition, device results, comparison to the standard gate-layer process, and its suitability for manufacturing.

  11. SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models

    PubMed Central

    2014-01-01

    Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894

  12. Phylogenetic analysis of higher-level relationships within Hydroidolina (Cnidaria: Hydrozoa) using mitochondrial genome data and insight into their mitochondrial transcription.

    PubMed

    Kayal, Ehsan; Bentlage, Bastian; Cartwright, Paulyn; Yanagihara, Angel A; Lindsay, Dhugal J; Hopcroft, Russell R; Collins, Allen G

    2015-01-01

    Hydrozoans display the most morphological diversity within the phylum Cnidaria. While recent molecular studies have provided some insights into their evolutionary history, sister group relationships remain mostly unresolved, particularly at mid-taxonomic levels. Specifically, within Hydroidolina, the most speciose hydrozoan subclass, the relationships and sometimes integrity of orders are highly unsettled. Here we obtained the near complete mitochondrial sequence of twenty-six hydroidolinan hydrozoan species from a range of sources (DNA and RNA-seq data, long-range PCR). Our analyses confirm previous inference of the evolution of mtDNA in Hydrozoa while introducing a novel genome organization. Using RNA-seq data, we propose a mechanism for the expression of mitochondrial mRNA in Hydroidolina that can be extrapolated to the other medusozoan taxa. Phylogenetic analyses using the full set of mitochondrial gene sequences provide some insights into the order-level relationships within Hydroidolina, including siphonophores as the first diverging clade, a well-supported clade comprised of Leptothecata-Filifera III-IV, and a second clade comprised of Aplanulata-Capitata s.s.-Filifera I-II. Finally, we describe our relatively inexpensive and accessible multiplexing strategy to sequence long-range PCR amplicons that can be adapted to most high-throughput sequencing platforms.

  13. Phylogenetic analysis of higher-level relationships within Hydroidolina (Cnidaria: Hydrozoa) using mitochondrial genome data and insight into their mitochondrial transcription

    PubMed Central

    Bentlage, Bastian; Cartwright, Paulyn; Yanagihara, Angel A.; Lindsay, Dhugal J.; Hopcroft, Russell R.; Collins, Allen G.

    2015-01-01

    Hydrozoans display the most morphological diversity within the phylum Cnidaria. While recent molecular studies have provided some insights into their evolutionary history, sister group relationships remain mostly unresolved, particularly at mid-taxonomic levels. Specifically, within Hydroidolina, the most speciose hydrozoan subclass, the relationships and sometimes integrity of orders are highly unsettled. Here we obtained the near complete mitochondrial sequence of twenty-six hydroidolinan hydrozoan species from a range of sources (DNA and RNA-seq data, long-range PCR). Our analyses confirm previous inference of the evolution of mtDNA in Hydrozoa while introducing a novel genome organization. Using RNA-seq data, we propose a mechanism for the expression of mitochondrial mRNA in Hydroidolina that can be extrapolated to the other medusozoan taxa. Phylogenetic analyses using the full set of mitochondrial gene sequences provide some insights into the order-level relationships within Hydroidolina, including siphonophores as the first diverging clade, a well-supported clade comprised of Leptothecata-Filifera III–IV, and a second clade comprised of Aplanulata-Capitata s.s.-Filifera I–II. Finally, we describe our relatively inexpensive and accessible multiplexing strategy to sequence long-range PCR amplicons that can be adapted to most high-throughput sequencing platforms. PMID:26618080

  14. RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes.

    PubMed

    Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian

    2014-09-01

    RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.

  15. The UCSC genome browser and associated tools

    PubMed Central

    Haussler, David; Kent, W. James

    2013-01-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting. PMID:22908213

  16. The UCSC genome browser and associated tools.

    PubMed

    Kuhn, Robert M; Haussler, David; Kent, W James

    2013-03-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.

  17. Strand-Specific Analysis of DNA Synthesis and Proteins Association with DNA Replication Forks in Budding Yeast.

    PubMed

    Yu, Chuanhe; Gan, Haiyun; Zhang, Zhiguo

    2018-01-01

    DNA replication initiates at DNA replication origins after unwinding of double-strand DNA(dsDNA) by replicative helicase to generate single-stranded DNA (ssDNA) templates for the continuous synthesis of leading-strand and the discontinuous synthesis of lagging-strand. Therefore, methods capable of detecting strand-specific information will likely yield insight into the association of proteins at leading and lagging strand of DNA replication forks and the regulation of leading and lagging strand synthesis during DNA replication. The enrichment and Sequencing of Protein-Associated Nascent DNA (eSPAN), which measure the relative amounts of proteins at nascent leading and lagging strands of DNA replication forks, is a step-wise procedure involving the chromatin immunoprecipitation (ChIP) of a protein of interest followed by the enrichment of protein-associated nascent DNA through BrdU immunoprecipitation. The isolated ssDNA is then subjected to strand-specific sequencing. This method can detect whether a protein is enriched at leading or lagging strand of DNA replication forks. In addition to eSPAN, two other strand-specific methods, (ChIP-ssSeq), which detects potential protein-ssDNA binding and BrdU-IP-ssSeq, which can measure synthesis of both leading and lagging strand, were developed along the way. These methods can provide strand-specific and complementary information about the association of the target protein with DNA replication forks as well as synthesis of leading and lagging strands genome wide. Below, we describe the detailed eSPAN, ChIP-ssSeq, and BrdU-IP-ssSeq protocols.

  18. Quantitative assessment of RNA-protein interactions with high-throughput sequencing-RNA affinity profiling.

    PubMed

    Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T

    2015-08-01

    Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.

  19. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.

    PubMed

    Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie

    2018-01-01

    Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.

  20. Integrated genome browser: visual analytics platform for genomics.

    PubMed

    Freese, Nowlan H; Norris, David C; Loraine, Ann E

    2016-07-15

    Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB's ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. IGB is open source and is freely available from http://bioviz.org/igb aloraine@uncc.edu. © The Author 2016. Published by Oxford University Press.

  1. Variant discovery in the sheepmeat odour and flavour in javanese fat tailed sheep using RNA sequencing

    NASA Astrophysics Data System (ADS)

    Abuzahra, M. A. M.; Jakaria; Listyarini, K.; Furqon, A.; Sumantri, C.; Uddin, M. J.; Gunawan, A.

    2018-05-01

    High-throughput RNA sequencing (RNA-Seq) reveals new challenges for the detection of transcriptome variants (SNPs) in different tissues and species. The aims of this study was to characterize a SNP discovery analysis in the sheep meat odour and flavour transcriptome using RNA-Seq. Six liver samples from divergent sheep meat odour and flavour were analyzed using the Illumina Genome Hiseq 2500 Analyzer. The SNP detection analysis revealed 142 SNPs in sheep meat samples, and a large number of those corresponded to differences between high and low sheep meat odour and flavour ovis genome assembly OAR v4.0. Among them, about 90.4% of genes had multiple polymorphisms within 12 genes (JAML, ANGPTL8, LOC101103463, SEPW1, SCN5A, LOC101113036, DOCK6, GTSE1, KIF12, KCTD17, KANK2, CYP2A6). Several of the SNPs (JAML, CYP2A6, SEPW1, and KIF12) found in this study could be included as suitable markers in genotyping platforms to perform association analyses in commercial populations and apply genomic selection protocols in the sheep meat production.

  2. Genome-Wide Profiling of Histone Modifications (H3K9me2 and H4K12ac) and Gene Expression in Rust (Uromyces appendiculatus) Inoculated Common Bean (Phaseolus vulgaris L.).

    PubMed

    Ayyappan, Vasudevan; Kalavacharla, Venu; Thimmapuram, Jyothi; Bhide, Ketaki P; Sripathi, Venkateswara R; Smolinski, Tomasz G; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce

    2015-01-01

    Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress.

  3. Genome-Wide Profiling of Histone Modifications (H3K9me2 and H4K12ac) and Gene Expression in Rust (Uromyces appendiculatus) Inoculated Common Bean (Phaseolus vulgaris L.)

    PubMed Central

    Thimmapuram, Jyothi; Bhide, Ketaki P.; Sripathi, Venkateswara R.; Smolinski, Tomasz G.; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce

    2015-01-01

    Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress. PMID:26167691

  4. A metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in Neoromicia bats within South Africa.

    PubMed

    Geldenhuys, Marike; Mortlock, Marinda; Weyer, Jacqueline; Bezuidt, Oliver; Seamark, Ernest C J; Kearney, Teresa; Gleasner, Cheryl; Erkkila, Tracy H; Cui, Helen; Markotter, Wanda

    2018-01-01

    Species within the Neoromicia bat genus are abundant and widely distributed in Africa. It is common for these insectivorous bats to roost in anthropogenic structures in urban regions. Additionally, Neoromicia capensis have previously been identified as potential hosts for Middle East respiratory syndrome (MERS)-related coronaviruses. This study aimed to ascertain the gastrointestinal virome of these bats, as viruses excreted in fecal material or which may be replicating in rectal or intestinal tissues have the greatest opportunities of coming into contact with other hosts. Samples were collected in five regions of South Africa over eight years. Initial virome composition was determined by viral metagenomic sequencing by pooling samples and enriching for viral particles. Libraries were sequenced on the Illumina MiSeq and NextSeq500 platforms, producing a combined 37 million reads. Bioinformatics analysis of the high throughput sequencing data detected the full genome of a novel species of the Circoviridae family, and also identified sequence data from the Adenoviridae, Coronaviridae, Herpesviridae, Parvoviridae, Papillomaviridae, Phenuiviridae, and Picornaviridae families. Metagenomic sequencing data was insufficient to determine the viral diversity of certain families due to the fragmented coverage of genomes and lack of suitable sequencing depth, as some viruses were detected from the analysis of reads-data only. Follow up conventional PCR assays targeting conserved gene regions for the Adenoviridae, Coronaviridae, and Herpesviridae families were used to confirm metagenomic data and generate additional sequences to determine genetic diversity. The complete coding genome of a MERS-related coronavirus was recovered with additional amplicon sequencing on the MiSeq platform. The new genome shared 97.2% overall nucleotide identity to a previous Neoromicia-associated MERS-related virus, also from South Africa. Conventional PCR analysis detected diverse adenovirus and herpesvirus sequences that were widespread throughout Neoromicia populations in South Africa. Furthermore, similar adenovirus sequences were detected within these populations throughout several years. With the exception of the coronaviruses, the study represents the first report of sequence data from several viral families within a Southern African insectivorous bat genus; highlighting the need for continued investigations in this regard.

  5. A metagenomic viral discovery approach identifies potential zoonotic and novel mammalian viruses in Neoromicia bats within South Africa

    PubMed Central

    Geldenhuys, Marike; Mortlock, Marinda; Weyer, Jacqueline; Bezuidt, Oliver; Seamark, Ernest C. J.; Kearney, Teresa; Gleasner, Cheryl; Erkkila, Tracy H.; Cui, Helen; Markotter, Wanda

    2018-01-01

    Species within the Neoromicia bat genus are abundant and widely distributed in Africa. It is common for these insectivorous bats to roost in anthropogenic structures in urban regions. Additionally, Neoromicia capensis have previously been identified as potential hosts for Middle East respiratory syndrome (MERS)-related coronaviruses. This study aimed to ascertain the gastrointestinal virome of these bats, as viruses excreted in fecal material or which may be replicating in rectal or intestinal tissues have the greatest opportunities of coming into contact with other hosts. Samples were collected in five regions of South Africa over eight years. Initial virome composition was determined by viral metagenomic sequencing by pooling samples and enriching for viral particles. Libraries were sequenced on the Illumina MiSeq and NextSeq500 platforms, producing a combined 37 million reads. Bioinformatics analysis of the high throughput sequencing data detected the full genome of a novel species of the Circoviridae family, and also identified sequence data from the Adenoviridae, Coronaviridae, Herpesviridae, Parvoviridae, Papillomaviridae, Phenuiviridae, and Picornaviridae families. Metagenomic sequencing data was insufficient to determine the viral diversity of certain families due to the fragmented coverage of genomes and lack of suitable sequencing depth, as some viruses were detected from the analysis of reads-data only. Follow up conventional PCR assays targeting conserved gene regions for the Adenoviridae, Coronaviridae, and Herpesviridae families were used to confirm metagenomic data and generate additional sequences to determine genetic diversity. The complete coding genome of a MERS-related coronavirus was recovered with additional amplicon sequencing on the MiSeq platform. The new genome shared 97.2% overall nucleotide identity to a previous Neoromicia-associated MERS-related virus, also from South Africa. Conventional PCR analysis detected diverse adenovirus and herpesvirus sequences that were widespread throughout Neoromicia populations in South Africa. Furthermore, similar adenovirus sequences were detected within these populations throughout several years. With the exception of the coronaviruses, the study represents the first report of sequence data from several viral families within a Southern African insectivorous bat genus; highlighting the need for continued investigations in this regard. PMID:29579103

  6. Liver Transcriptome and miRNA Analysis of Silver Carp (Hypophthalmichthys molitrix) Intraperitoneally Injected With Microcystin-LR

    PubMed Central

    Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji

    2018-01-01

    Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738

  7. 19 CFR 12.39 - Imported articles involving unfair methods of competition or practices.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... the authorization or consent of the Government. (e) Importations of semiconductor chip products. (1) In accordance with the Semiconductor Chip Protection Act of 1984 (17 U.S.C. 901 et seq.), if the... imported semiconductor chip products which infringe his rights in such mask work, the owner must obtain a...

  8. GC-Content Normalization for RNA-Seq Data

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  9. High-confidence coding and noncoding transcriptome maps

    PubMed Central

    2017-01-01

    The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519

  10. Unified View of Backward Backtracking in Short Read Mapping

    NASA Astrophysics Data System (ADS)

    Mäkinen, Veli; Välimäki, Niko; Laaksonen, Antti; Katainen, Riku

    Mapping short DNA reads to the reference genome is the core task in the recent high-throughput technologies to study e.g. protein-DNA interactions (ChIP-seq) and alternative splicing (RNA-seq). Several tools for the task (bowtie, bwa, SOAP2, TopHat) have been developed that exploit Burrows-Wheeler transform and the backward backtracking technique on it, to map the reads to their best approximate occurrences in the genome. These tools use different tailored mechanisms for small error-levels to prune the search phase significantly. We propose a new pruning mechanism that can be seen a generalization of the tailored mechanisms used so far. It uses a novel idea of storing all cyclic rotations of fixed length substrings of the reference sequence with a compressed index that is able to exploit the repetitions created to level out the growth of the input set. For RNA-seq we propose a new method that combines dynamic programming with backtracking to map efficiently and correctly all reads that span two exons. Same mechanism can also be used for mapping mate-pair reads.

  11. Transcription profiling using RNA-Seq demonstrates expression differences in the body walls of juvenile albino and normal sea cucumbers Apostichopus japonicus

    NASA Astrophysics Data System (ADS)

    Ma, Deyou; Yang, Hongsheng; Sun, Lina; Chen, Muyan

    2014-01-01

    Sea cucumbers Apostichopus japonicus are one of the most important aquaculture species in China. Their normal body color is black to fit their surroundings. Wild albinos are rare and hard to breed. To understand the differences between albino and normal (control) sea cucumbers at the transcriptional level, we sequenced the transcriptomes in their body-wall tissues using RNA-Seq high-throughput sequencing. Approximately 4.876 million (M) and 4.884 M 200-nucleotide-long cDNA reads were produced in the cDNA libraries derived from the body walls of albino and control samples, respectively. A total of 9 561 (46.89%) putative genes were identified from among the RNA-Seq reads in both libraries. After filtering, 837 significantly differentially regulated genes were identified in the albino library compared with in the control library, and 3.6% of the differentially expressed genes (DEGs) were found to have changed those more than five-fold. The expression levels of 10 DEGs were checked by real-time PCR and the results were in full accord with the RNA-Seq expression trends, although the amplitude of the differences in expression levels was lower in all cases. A series of pathways were significantly enriched for the DEGs. These pathways were closely related to phagocytosis, the complement and coagulation cascades, apoptosis-related diseases, cytokine-cytokine receptor interaction, and cell adhesion. The differences in gene expression and enriched pathways between the albino and control sea cucumbers offer control targets for cultivating excellent albino A. japonicus strains in the future.

  12. Strategy to Identify and Test Putative Light-Sensitive Non-Opsin G-Protein-Coupled Receptors: A Case Study.

    PubMed

    Faggionato, Davide; Serb, Jeanne M

    2017-08-01

    The rise of high-throughput RNA sequencing (RNA-seq) and de novo transcriptome assembly has had a transformative impact on how we identify and study genes in the phototransduction cascade of non-model organisms. But the advantage provided by the nearly automated annotation of RNA-seq transcriptomes may at the same time hinder the possibility for gene discovery and the discovery of new gene functions. For example, standard functional annotation based on domain homology to known protein families can only confirm group membership, not identify the emergence of new biochemical function. In this study, we show the importance of developing a strategy that circumvents the limitations of semiautomated annotation and apply this workflow to photosensitivity as a means to discover non-opsin photoreceptors. We hypothesize that non-opsin G-protein-coupled receptor (GPCR) proteins may have chromophore-binding lysines in locations that differ from opsin. Here, we provide the first case study describing non-opsin light-sensitive GPCRs based on tissue-specific RNA-seq data of the common bay scallop Argopecten irradians (Lamarck, 1819). Using a combination of sequence analysis and three-dimensional protein modeling, we identified two candidate proteins. We tested their photochemical properties and provide evidence showing that these two proteins incorporate 11-cis and/or all-trans retinal and react to light photochemically. Based on this case study, we demonstrate that there is potential for the discovery of new light-sensitive GPCRs, and we have developed a workflow that starts from RNA-seq assemblies to the discovery of new non-opsin, GPCR-based photopigments.

  13. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    PubMed

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  14. Squeezing water from a stone: high-throughput sequencing from a 145-year old holotype resolves (barely) a cryptic species problem in flying lizards.

    PubMed

    McGuire, Jimmy A; Cotoras, Darko D; O'Connell, Brendan; Lawalata, Shobi Z S; Wang-Claypool, Cynthia Y; Stubbs, Alexander; Huang, Xiaoting; Wogan, Guinevere O U; Hykin, Sarah M; Reilly, Sean B; Bi, Ke; Riyanto, Awal; Arida, Evy; Smith, Lydia L; Milne, Heather; Streicher, Jeffrey W; Iskandar, Djoko T

    2018-01-01

    We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus . Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (721 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.

  15. Genome-wide mapping of alternative splicing in Arabidopsis thaliana

    PubMed Central

    Filichkin, Sergei A.; Priest, Henry D.; Givan, Scott A.; Shen, Rongkun; Bryant, Douglas W.; Fox, Samuel E.; Wong, Weng-Keen; Mockler, Todd C.

    2010-01-01

    Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least ∼42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC+ isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC+ isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC+ and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression. PMID:19858364

  16. Dynamic Changes of Photosynthetic Picoeukaryotes Composition in the Northwestern Pacific Ocean Revealed by High-Throughput Tag Sequencing of Plastid 16S rRNA Gene

    NASA Astrophysics Data System (ADS)

    Choi, D. H.; Noh, J. H.; Selph, K. E.; Lee, C. M.

    2016-02-01

    Photosynthetic picoeukaryotes (PPEs) are major oceanic primary producers. However, the diversity of such communities remains poorly understood, especially in the northwestern (NW) Pacific. We investigated the abundance and diversity of PPEs, and recorded environmental variables, along a transect from the coast to the open Pacific Ocean. High-throughput tag sequencing (using the MiSeq system) revealed the diversity of plastid 16S rRNA genes. The dominant PPEs changed at the class level along the transect. Prymnesiophyceae were the only dominant PPEs in the warm pool of the NW Pacific, but Mamiellophyceae dominated in coastal waters of the East China Sea. Phylogenetically, most Prymnesiophyceae sequences could not be resolved at lower taxonomic levels because no close relatives have been cultured. Within the Mamiellophyceae, the genera Micromonas and Ostreococcus dominated in marginal coastal areas affected by open water, whereas Bathycoccus dominated in the lower euphotic depths of open oligotrophic waters. Cryptophyceae and Phaeocystis (of the Prymnesiophyceae) dominated in areas affected principally by coastal water. We also defined the biogeographical distributions of Chrysophyceae, Prasinophyceae, Bacillariophyceaea, and Pelagophyceae. These distributions were influenced by temperature, salinity, and chlorophyll a and nutrient concentrations.

  17. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq)

    PubMed Central

    Langley, Alexander R.; Gräf, Stefan; Smith, James C.; Krude, Torsten

    2016-01-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. PMID:27587586

  18. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

    PubMed

    Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

    2016-12-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. The ISMARA client

    PubMed Central

    Ioannidis, Vassilios; van Nimwegen, Erik; Stockinger, Heinz

    2016-01-01

    ISMARA ( ismara.unibas.ch) automatically infers the key regulators and regulatory interactions from high-throughput gene expression or chromatin state data. However, given the large sizes of current next generation sequencing (NGS) datasets, data uploading times are a major bottleneck. Additionally, for proprietary data, users may be uncomfortable with uploading entire raw datasets to an external server. Both these problems could be alleviated by providing a means by which users could pre-process their raw data locally, transferring only a small summary file to the ISMARA server. We developed a stand-alone client application that pre-processes large input files (RNA-seq or ChIP-seq data) on the user's computer for performing ISMARA analysis in a completely automated manner, including uploading of small processed summary files to the ISMARA server. This reduces file sizes by up to a factor of 1000, and upload times from many hours to mere seconds. The client application is available from ismara.unibas.ch/ISMARA/client. PMID:28232860

  20. Broad-Enrich: functional interpretation of large sets of broad genomic regions.

    PubMed

    Cavalcante, Raymond G; Lee, Chee; Welch, Ryan P; Patil, Snehal; Weymouth, Terry; Scott, Laura J; Sartor, Maureen A

    2014-09-01

    Functional enrichment testing facilitates the interpretation of Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data in terms of pathways and other biological contexts. Previous methods developed and used to test for key gene sets affected in ChIP-seq experiments treat peaks as points, and are based on the number of peaks associated with a gene or a binary score for each gene. These approaches work well for transcription factors, but histone modifications often occur over broad domains, and across multiple genes. To incorporate the unique properties of broad domains into functional enrichment testing, we developed Broad-Enrich, a method that uses the proportion of each gene's locus covered by a peak. We show that our method has a well-calibrated false-positive rate, performing well with ChIP-seq data having broad domains compared with alternative approaches. We illustrate Broad-Enrich with 55 ENCODE ChIP-seq datasets using different methods to define gene loci. Broad-Enrich can also be applied to other datasets consisting of broad genomic domains such as copy number variations. http://broad-enrich.med.umich.edu for Web version and R package. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  1. Comprehensive evaluation of genome-wide 5-hydroxymethylcytosine profiling approaches in human DNA.

    PubMed

    Skvortsova, Ksenia; Zotenko, Elena; Luu, Phuc-Loi; Gould, Cathryn M; Nair, Shalima S; Clark, Susan J; Stirzaker, Clare

    2017-01-01

    The discovery that 5-methylcytosine (5mC) can be oxidized to 5-hydroxymethylcytosine (5hmC) by the ten-eleven translocation (TET) proteins has prompted wide interest in the potential role of 5hmC in reshaping the mammalian DNA methylation landscape. The gold-standard bisulphite conversion technologies to study DNA methylation do not distinguish between 5mC and 5hmC. However, new approaches to mapping 5hmC genome-wide have advanced rapidly, although it is unclear how the different methods compare in accurately calling 5hmC. In this study, we provide a comparative analysis on brain DNA using three 5hmC genome-wide approaches, namely whole-genome bisulphite/oxidative bisulphite sequencing (WG Bis/OxBis-seq), Infinium HumanMethylation450 BeadChip arrays coupled with oxidative bisulphite (HM450K Bis/OxBis) and antibody-based immunoprecipitation and sequencing of hydroxymethylated DNA (hMeDIP-seq). We also perform loci-specific TET-assisted bisulphite sequencing (TAB-seq) for validation of candidate regions. We show that whole-genome single-base resolution approaches are advantaged in providing precise 5hmC values but require high sequencing depth to accurately measure 5hmC, as this modification is commonly in low abundance in mammalian cells. HM450K arrays coupled with oxidative bisulphite provide a cost-effective representation of 5hmC distribution, at CpG sites with 5hmC levels >~10%. However, 5hmC analysis is restricted to the genomic location of the probes, which is an important consideration as 5hmC modification is commonly enriched at enhancer elements. Finally, we show that the widely used hMeDIP-seq method provides an efficient genome-wide profile of 5hmC and shows high correlation with WG Bis/OxBis-seq 5hmC distribution in brain DNA. However, in cell line DNA with low levels of 5hmC, hMeDIP-seq-enriched regions are not detected by WG Bis/OxBis or HM450K, either suggesting misinterpretation of 5hmC calls by hMeDIP or lack of sensitivity of the latter methods. We highlight both the advantages and caveats of three commonly used genome-wide 5hmC profiling technologies and show that interpretation of 5hmC data can be significantly influenced by the sensitivity of methods used, especially as the levels of 5hmC are low and vary in different cell types and different genomic locations.

  2. Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.

    PubMed

    Hardcastle, Thomas J

    2016-01-15

    High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html. tjh48@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. SeqAPASS: Sequence alignment to predict across-species ...

    EPA Pesticide Factsheets

    Efforts to shift the toxicity testing paradigm from whole organism studies to those focused on the initiation of toxicity and relevant pathways have led to increased utilization of in vitro and in silico methods. Hence the emergence of high through-put screening (HTS) programs, such as U.S. EPA ToxCast, and application of the adverse outcome pathway (AOP) framework for identifying and defining biological key events triggered upon perturbation of molecular initiating events and leading to adverse outcomes occuring at a level of organization relevant for risk assessment [1]. With these recent initiatives to harness the power of “the pathway” in describing and evaluating toxicity comes the need to extrapolate data beyond the model species. Sequence alignment to predict across-species susceptibilty (SeqAPASS) is a web-based tool that allows the user to begin to understand how broadly HTS data or AOP constructs may plausibly be extrapolated across species, while describing the relative intrinsic susceptibiltiy of different taxa to chemicals with known modes of action (e.g., pharmaceuticals and pesticides). The tool rapidly and strategically assesses available molecular target information to describe protein sequence similarity at the primary amino acid sequence, conserved domain, and individual amino acid residue levels. This in silico approach to species extrapolation was designed to automate and streamline the relatively complex and time-consuming process of co

  4. DIVERSITY in binding, regulation, and evolution revealed from high-throughput ChIP.

    PubMed

    Mitra, Sneha; Biswas, Anushua; Narlikar, Leelavati

    2018-04-01

    Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions are typically investigated for enriched sequence-motifs, which are likely to model the DNA-binding specificity of the profiled protein and/or of co-occurring proteins. However, simple enrichment analyses can miss insights into the binding-activity of the protein. Note that ChIP reports regions making direct contact with the protein as well as those binding through intermediaries. For example, consider a ChIP experiment targeting protein X, which binds DNA at its cognate sites, but simultaneously interacts with four other proteins. Each of these proteins also binds to its own specific cognate sites along distant parts of the genome, a scenario consistent with the current view of transcriptional hubs and chromatin loops. Since ChIP will pull down all X-associated regions, the final reported data will be a union of five distinct sets of regions, each containing binding sites of one of the five proteins, respectively. Characterizing all five different motifs and the corresponding sets is important to interpret the ChIP experiment and ultimately, the role of X in regulation. We present diversity which attempts exactly this: it partitions the data so that each partition can be characterized with its own de novo motif. Diversity uses a Bayesian approach to identify the optimal number of motifs and the associated partitions, which together explain the entire dataset. This is in contrast to standard motif finders, which report motifs individually enriched in the data, but do not necessarily explain all reported regions. We show that the different motifs and associated regions identified by diversity give insights into the various complexes that may be forming along the chromatin, something that has so far not been attempted from ChIP data. Webserver at http://diversity.ncl.res.in/; standalone (Mac OS X/Linux) from https://github.com/NarlikarLab/DIVERSITY/releases/tag/v1.0.0.

  5. Present Scenario of Long Non-Coding RNAs in Plants

    PubMed Central

    Bhatia, Garima; Goyal, Neetu; Sharma, Shailesh; Upadhyay, Santosh Kumar; Singh, Kashmir

    2017-01-01

    Small non-coding RNAs have been extensively studied in plants over the last decade. In contrast, genome-wide identification of plant long non-coding RNAs (lncRNAs) has recently gained momentum. LncRNAs are now being recognized as important players in gene regulation, and their potent regulatory roles are being studied comprehensively in eukaryotes. LncRNAs were first reported in humans in 1992. Since then, research in animals, particularly in humans, has rapidly progressed, and a vast amount of data has been generated, collected, and organized using computational approaches. Additionally, numerous studies have been conducted to understand the roles of these long RNA species in several diseases. However, the status of lncRNA investigation in plants lags behind that in animals (especially humans). Efforts are being made in this direction using computational tools and high-throughput sequencing technologies, such as the lncRNA microarray technique, RNA-sequencing (RNA-seq), RNA capture sequencing, (RNA CaptureSeq), etc. Given the current scenario, significant amounts of data have been produced regarding plant lncRNAs, and this amount is likely to increase in the subsequent years. In this review we have documented brief information about lncRNAs and their status of research in plants, along with the plant-specific resources/databases for information retrieval on lncRNAs. PMID:29657289

  6. Multimodal RNA-seq using single-strand, double-strand, and CircLigase-based capture yields a refined and extended description of the C. elegans transcriptome.

    PubMed

    Lamm, Ayelet T; Stadler, Michael R; Zhang, Huibin; Gent, Jonathan I; Fire, Andrew Z

    2011-02-01

    We have used a combination of three high-throughput RNA capture and sequencing methods to refine and augment the transcriptome map of a well-studied genetic model, Caenorhabditis elegans. The three methods include a standard (non-directional) library preparation protocol relying on cDNA priming and foldback that has been used in several previous studies for transcriptome characterization in this species, and two directional protocols, one involving direct capture of single-stranded RNA fragments and one involving circular-template PCR (CircLigase). We find that each RNA-seq approach shows specific limitations and biases, with the application of multiple methods providing a more complete map than was obtained from any single method. Of particular note in the analysis were substantial advantages of CircLigase-based and ssRNA-based capture for defining sequences and structures of the precise 5' ends (which were lost using the double-strand cDNA capture method). Of the three methods, ssRNA capture was most effective in defining sequences to the poly(A) junction. Using data sets from a spectrum of C. elegans strains and stages and the UCSC Genome Browser, we provide a series of tools, which facilitate rapid visualization and assignment of gene structures.

  7. A high-density genetic map and QTL analysis of agronomic traits in foxtail millet [Setaria italica (L.) P. Beauv.] using RAD-seq.

    PubMed

    Wang, Jun; Wang, Zhilan; Du, Xiaofen; Yang, Huiqing; Han, Fang; Han, Yuanhuai; Yuan, Feng; Zhang, Linyi; Peng, Shuzhong; Guo, Erhu

    2017-01-01

    Foxtail millet (Setaria italica), a very important grain crop in China, has become a new model plant for cereal crops and biofuel grasses. Although its reference genome sequence was released recently, quantitative trait loci (QTLs) controlling complex agronomic traits remains limited. The development of massively parallel genotyping methods and next-generation sequencing technologies provides an excellent opportunity for developing single-nucleotide polymorphisms (SNPs) for linkage map construction and QTL analysis of complex quantitative traits. In this study, a high-throughput and cost-effective RAD-seq approach was employed to generate a high-density genetic map for foxtail millet. A total of 2,668,587 SNP loci were detected according to the reference genome sequence; meanwhile, 9,968 SNP markers were used to genotype 124 F2 progenies derived from the cross between Hongmiaozhangu and Changnong35; a high-density genetic map spanning 1648.8 cM, with an average distance of 0.17 cM between adjacent markers was constructed; 11 major QTLs for eight agronomic traits were identified; five co-dominant DNA markers were developed. These findings will be of value for the identification of candidate genes and marker-assisted selection in foxtail millet.

  8. A high-density genetic map and QTL analysis of agronomic traits in foxtail millet [Setaria italica (L.) P. Beauv.] using RAD-seq

    PubMed Central

    Wang, Zhilan; Du, Xiaofen; Yang, Huiqing; Han, Fang; Han, Yuanhuai; Yuan, Feng; Zhang, Linyi; Peng, Shuzhong; Guo, Erhu

    2017-01-01

    Foxtail millet (Setaria italica), a very important grain crop in China, has become a new model plant for cereal crops and biofuel grasses. Although its reference genome sequence was released recently, quantitative trait loci (QTLs) controlling complex agronomic traits remains limited. The development of massively parallel genotyping methods and next-generation sequencing technologies provides an excellent opportunity for developing single-nucleotide polymorphisms (SNPs) for linkage map construction and QTL analysis of complex quantitative traits. In this study, a high-throughput and cost-effective RAD-seq approach was employed to generate a high-density genetic map for foxtail millet. A total of 2,668,587 SNP loci were detected according to the reference genome sequence; meanwhile, 9,968 SNP markers were used to genotype 124 F2 progenies derived from the cross between Hongmiaozhangu and Changnong35; a high-density genetic map spanning 1648.8 cM, with an average distance of 0.17 cM between adjacent markers was constructed; 11 major QTLs for eight agronomic traits were identified; five co-dominant DNA markers were developed. These findings will be of value for the identification of candidate genes and marker-assisted selection in foxtail millet. PMID:28644843

  9. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding

    PubMed Central

    Gilad, Yoav; Pritchard, Jonathan K.; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede. PMID:26406244

  10. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.

    PubMed

    Raj, Anil; Shim, Heejung; Gilad, Yoav; Pritchard, Jonathan K; Stephens, Matthew

    2015-01-01

    Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.

  11. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications.

    PubMed

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene-patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene-patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at http://www.patome.org/; the information is updated bimonthly.

  12. Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

    PubMed Central

    Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon

    2007-01-01

    With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479

  13. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.

    PubMed

    Benoit, Gaëtan; Lemaitre, Claire; Lavenier, Dominique; Drezen, Erwan; Dayris, Thibault; Uricaru, Raluca; Rizk, Guillaume

    2015-09-14

    Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.

  14. Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.

    PubMed

    Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong

    2018-05-04

    Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.

  15. Fast and accurate enzyme activity measurements using a chip-based microfluidic calorimeter.

    PubMed

    van Schie, Morten M C H; Ebrahimi, Kourosh Honarmand; Hagen, Wilfred R; Hagedoorn, Peter-Leon

    2018-03-01

    Recent developments in microfluidic and nanofluidic technologies have resulted in development of new chip-based microfluidic calorimeters with potential use in different fields. One application would be the accurate high-throughput measurement of enzyme activity. Calorimetry is a generic way to measure activity of enzymes, but unlike conventional calorimeters, chip-based calorimeters can be easily automated and implemented in high-throughput screening platforms. However, application of chip-based microfluidic calorimeters to measure enzyme activity has been limited due to problems associated with miniaturization such as incomplete mixing and a decrease in volumetric heat generated. To address these problems we introduced a calibration method and devised a convenient protocol for using a chip-based microfluidic calorimeter. Using the new calibration method, the progress curve of alkaline phosphatase, which has product inhibition for phosphate, measured by the calorimeter was the same as that recorded by UV-visible spectroscopy. Our results may enable use of current chip-based microfluidic calorimeters in a simple manner as a tool for high-throughput screening of enzyme activity with potential applications in drug discovery and enzyme engineering. Copyright © 2017. Published by Elsevier Inc.

  16. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    PubMed

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  17. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    PubMed

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

  18. PCR Primers to Study the Diversity of Expressed Fungal Genes Encoding Lignocellulolytic Enzymes in Soils Using High-Throughput Sequencing

    PubMed Central

    Barbi, Florian; Bragalini, Claudia; Vallon, Laurent; Prudent, Elsa; Dubost, Audrey; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia

    2014-01-01

    Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5) and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2), active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may lead to the discovery of complex patterns in gene expression of soil fungal communities. PMID:25545363

  19. High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome

    PubMed Central

    Tang, Xiaoqing; Xiao, Yunhua; Lv, Tingting; Wang, Fangquan; Zhu, QianHao; Zheng, Tianqing; Yang, Jie

    2014-01-01

    Background Isatis indigotica, the source of the traditional Chinese medicine Radix isatidis (Ban-Lan-Gen), is an extremely important economical crop in China. To facilitate biological, biochemical and molecular research on the medicinal chemicals in I. indigotica, here we report the first I. indigotica transcriptome generated by RNA sequencing (RNA-seq). Results RNA-seq library was created using RNA extracted from a mixed sample including leaf and root. A total of 33,238 unigenes were assembled from more than 28 million of high quality short reads. The quality of the assembly was experimentally examined by cDNA sequencing of seven randomly selected unigenes. Based on blast search 28,184 unigenes had a hit in at least one of the protein and nucleotide databases used in this study, and 8 unigenes were found to be associated with biosynthesis of indole and its derivatives. According to Gene Ontology classification, 22,365 unigenes were categorized into 48 functional groups. Furthermore, Clusters of Orthologous Group and Swiss-Port annotation were assigned for 7,707 and 18,679 unigenes, respectively. Analysis of repeat motifs identified 6,400 simple sequence repeat markers in 4,509 unigenes. Conclusion Our data provide a comprehensive sequence resource for molecular study of I. indigotica. Our results will facilitate studies on the functions of genes involved in the indole alkaloid biosynthesis pathway and on metabolism of nitrogen and indole alkaloids in I. indigotica and its related species. PMID:25259890

  20. Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project.

    PubMed

    Biesecker, Leslie G

    2012-04-01

    The debate surrounding the return of results from high-throughput genomic interrogation encompasses many important issues including ethics, law, economics, and social policy. As well, the debate is also informed by the molecular, genetic, and clinical foundations of the emerging field of clinical genomics, which is based on this new technology. This article outlines the main biomedical considerations of sequencing technologies and demonstrates some of the early clinical experiences with the technology to enable the debate to stay focused on real-world practicalities. These experiences are based on early data from the ClinSeq project, which is a project to pilot the use of massively parallel sequencing in a clinical research context with a major aim to develop modes of returning results to individual subjects. The study has enrolled >900 subjects and generated exome sequence data on 572 subjects. These data are beginning to be interpreted and returned to the subjects, which provides examples of the potential usefulness and pitfalls of clinical genomics. There are numerous genetic results that can be readily derived from a genome including rare, high-penetrance traits, and carrier states. However, much work needs to be done to develop the tools and resources for genomic interpretation. The main lesson learned is that a genome sequence may be better considered as a health-care resource, rather than a test, one that can be interpreted and used over the lifetime of the patient.

  1. GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

    PubMed Central

    Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

    2013-01-01

    We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070

  2. Identification of key factors regulating self-renewal and differentiation in EML hematopoietic precursor cells by RNA-sequencing analysis.

    PubMed

    Zong, Shan; Deng, Shuyun; Chen, Kenian; Wu, Jia Qian

    2014-11-11

    Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.

  3. RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

    PubMed

    Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

    2017-01-01

    Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.

  4. Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis

    PubMed Central

    Chen, Kenian; Wu, Jia Qian

    2014-01-01

    Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo. PMID:25407807

  5. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  6. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly

    PubMed Central

    Wala, Jeremiah; Beroukhim, Rameen

    2017-01-01

    Abstract We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. Availability and Implementation: SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. Contact: jwala@broadinstitue.org; rameen@broadinstitute.org PMID:28011768

  7. SeqLib: a C ++ API for rapid BAM manipulation, sequence alignment and sequence assembly.

    PubMed

    Wala, Jeremiah; Beroukhim, Rameen

    2017-03-01

    We present SeqLib, a C ++ API and command line tool that provides a rapid and user-friendly interface to BAM/SAM/CRAM files, global sequence alignment operations and sequence assembly. Four C libraries perform core operations in SeqLib: HTSlib for BAM access, BWA-MEM and BLAT for sequence alignment and Fermi for error correction and sequence assembly. Benchmarking indicates that SeqLib has lower CPU and memory requirements than leading C ++ sequence analysis APIs. We demonstrate an example of how minimal SeqLib code can extract, error-correct and assemble reads from a CRAM file and then align with BWA-MEM. SeqLib also provides additional capabilities, including chromosome-aware interval queries and read plotting. Command line tools are available for performing integrated error correction, micro-assemblies and alignment. SeqLib is available on Linux and OSX for the C ++98 standard and later at github.com/walaj/SeqLib. SeqLib is released under the Apache2 license. Additional capabilities for BLAT alignment are available under the BLAT license. jwala@broadinstitue.org ; rameen@broadinstitute.org. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  8. A Multicenter Study To Evaluate the Performance of High-Throughput Sequencing for Virus Detection

    PubMed Central

    Ng, Siemon H. S.; Vandeputte, Olivier; Aljanahi, Aisha; Deyati, Avisek; Cassart, Jean-Pol; Charlebois, Robert L.; Taliaferro, Lanyn P.

    2017-01-01

    ABSTRACT The capability of high-throughput sequencing (HTS) for detection of known and unknown viruses makes it a powerful tool for broad microbial investigations, such as evaluation of novel cell substrates that may be used for the development of new biological products. However, like any new assay, regulatory applications of HTS need method standardization. Therefore, our three laboratories initiated a study to evaluate performance of HTS for potential detection of viral adventitious agents by spiking model viruses in different cellular matrices to mimic putative materials for manufacturing of biologics. Four model viruses were selected based upon different physical and biochemical properties and commercial availability: human respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), feline leukemia virus (FeLV), and human reovirus (REO). Additionally, porcine circovirus (PCV) was tested by one laboratory. Independent samples were prepared for HTS by spiking intact viruses or extracted viral nucleic acids, singly or mixed, into different HeLa cell matrices (resuspended whole cells, cell lysate, or total cellular RNA). Data were obtained using different sequencing platforms (Roche 454, Illumina HiSeq1500 or HiSeq2500). Bioinformatic analyses were performed independently by each laboratory using available tools, pipelines, and databases. The results showed that comparable virus detection was obtained in the three laboratories regardless of sample processing, library preparation, sequencing platform, and bioinformatic analysis: between 0.1 and 3 viral genome copies per cell were detected for all of the model viruses used. This study highlights the potential for using HTS for sensitive detection of adventitious viruses in complex biological samples containing cellular background. IMPORTANCE Recent high-throughput sequencing (HTS) investigations have resulted in unexpected discoveries of known and novel viruses in a variety of sample types, including research materials, clinical materials, and biological products. Therefore, HTS can be a powerful tool for supplementing current methods for demonstrating the absence of adventitious or unwanted viruses in biological products, particularly when using a new cell line. However, HTS is a complex technology with different platforms, which needs standardization for evaluation of biologics. This collaborative study was undertaken to investigate detection of different virus types using two different HTS platforms. The results of the independently performed studies demonstrated a similar sensitivity of virus detection, regardless of the different sample preparation and processing procedures and bioinformatic analyses done in the three laboratories. Comparable HTS detection of different virus types supports future development of reference virus materials for standardization and validation of different HTS platforms. PMID:28932815

  9. Illumina Unamplified Indexed Library Construction: An Automated Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hack, Christopher A.; Sczyrba, Alexander; Cheng, Jan-Fang

    Manual library construction is a limiting factor in Illumina sequencing. Constructing libraries by hand is costly, time-consuming, low-throughput, and ergonomically hazardous, and constructing multiple libraries introduces risk of library failure due to pipetting errors. The ability to construct multiple libraries simultaneously in automated fashion represents significant cost and time savings. Here we present a strategy to construct up to 96 unamplified indexed libraries using Illumina TruSeq reagents and a Biomek FX robotic platform. We also present data to indicate that this library construction method has little or no risk of cross-contamination between samples.

  10. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.

    PubMed

    Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D

    2015-07-30

    RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Andersen, G.L.; He, Z.; DeSantis, T.Z.

    Microarrays have proven to be a useful and high-throughput method to provide targeted DNA sequence information for up to many thousands of specific genetic regions in a single test. A microarray consists of multiple DNA oligonucleotide probes that, under high stringency conditions, hybridize only to specific complementary nucleic acid sequences (targets). A fluorescent signal indicates the presence and, in many cases, the abundance of genetic regions of interest. In this chapter we will look at how microarrays are used in microbial ecology, especially with the recent increase in microbial community DNA sequence data. Of particular interest to microbial ecologists, phylogeneticmore » microarrays are used for the analysis of phylotypes in a community and functional gene arrays are used for the analysis of functional genes, and, by inference, phylotypes in environmental samples. A phylogenetic microarray that has been developed by the Andersen laboratory, the PhyloChip, will be discussed as an example of a microarray that targets the known diversity within the 16S rRNA gene to determine microbial community composition. Using multiple, confirmatory probes to increase the confidence of detection and a mismatch probe for every perfect match probe to minimize the effect of cross-hybridization by non-target regions, the PhyloChip is able to simultaneously identify any of thousands of taxa present in an environmental sample. The PhyloChip is shown to reveal greater diversity within a community than rRNA gene sequencing due to the placement of the entire gene product on the microarray compared with the analysis of up to thousands of individual molecules by traditional sequencing methods. A functional gene array that has been developed by the Zhou laboratory, the GeoChip, will be discussed as an example of a microarray that dynamically identifies functional activities of multiple members within a community. The recent version of GeoChip contains more than 24,000 50mer oligonucleotide probes and covers more than 10,000 gene sequences in 150 gene categories involved in carbon, nitrogen, sulfur, and phosphorus cycling, metal resistance and reduction, and organic contaminant degradation. GeoChip can be used as a generic tool for microbial community analysis, and also link microbial community structure to ecosystem functioning. Examples of the application of both arrays in different environmental samples will be described in the two subsequent sections.« less

  12. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Cell culture compositions

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yiao, Jian

    2014-03-18

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6 (SEQ ID NO:1 encodes the full length endoglucanase; SEQ ID NO:4 encodes the mature form), and the corresponding endoglucanase VI amino acid sequence ("EGVI"; SEQ ID NO:3 is the signal sequence; SEQ ID NO:2 is the mature sequence). The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  14. Considerations on Experimental Design and Data Analysis of Chromatin Immunoprecipitation Experiments.

    PubMed

    Jordán-Pla, Antonio; Visa, Neus

    2018-01-01

    Arguably one of the most valuable techniques to study chromatin organization, ChIP is the method of choice to map the contacts established between proteins and genomic DNA. Ever since its inception, more than 30 years ago, ChIP has been constantly evolving, improving, and expanding its capabilities and reach. Despite its widespread use by many laboratories across a wide variety of disciplines, ChIP assays can be sometimes challenging to design, and are often sensitive to variations in practical implementation.In this chapter, we provide a general overview of the ChIP method and its most common variations, with a special focus on ChIP-seq. We try to address some of the most important aspects that need to be taken into account in order to design and perform experiments that generate the most reproducible, high-quality data. Some of the main topics covered include the use of properly characterized antibodies, alternatives to chromatin preparation, the need for proper controls, and some recommendations about ChIP-seq data analysis.

  15. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

    PubMed Central

    Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

    2016-01-01

    Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591

  16. High throughput MLVA-16 typing for Brucella based on the microfluidics technology

    PubMed Central

    2011-01-01

    Background Brucellosis, a zoonosis caused by the genus Brucella, has been eradicated in Northern Europe, Australia, the USA and Canada, but remains endemic in most areas of the world. The strain and biovar typing of Brucella field samples isolated in outbreaks is useful for tracing back source of infection and may be crucial for discriminating naturally occurring outbreaks versus bioterrorist events, being Brucella a potential biological warfare agent. In the last years MLVA-16 has been described for Brucella spp. genotyping. The MLVA band profiles may be resolved by different techniques i.e. the manual agarose gels, the capillary electrophoresis sequencing systems or the microfluidic Lab-on-Chip electrophoresis. In this paper we described a high throughput system of MLVA-16 typing for Brucella spp. by using of the microfluidics technology. Results The Caliper LabChip 90 equipment was evaluated for MLVA-16 typing of sixty-three Brucella samples. Furthermore, in order to validate the system, DNA samples previously resolved by sequencing system and Agilent technology, were de novo genotyped. The comparison of the MLVA typing data obtained by the Caliper equipment and those previously obtained by the other analysis methods showed a good correlation. However the outputs were not accurate as the Caliper DNA fragment sizes showed discrepancies compared with real data and a conversion table from observed to expected data was created. Conclusion In this paper we described the MLVA-16 using a rapid, sophisticated microfluidics technology for detection of amplification product sizes. The comparison of the MLVA typing data produced by Caliper LabChip 90 system with the data obtained by different techniques showed a general concordance of the results. Furthermore this platform represents a significant improvement in terms of handling, data acquiring, computational efficiency and rapidity, allowing to perform the strain genotyping in a time equal to one sixth respect to other microfluidics systems as e.g. the Agilent 2100 bioanalyzer. Finally, this platform can be considered a valid alternative to standard genotyping techniques, particularly useful dealing with a large number of samples in short time. These data confirmed that this technology represents a significative advancement in high-throughput accurate Brucella genotyping. PMID:21435217

  17. Carbon nanotubes for voltage reduction and throughput enhancement of electrical cell lysis on a lab-on-a-chip.

    PubMed

    Shahini, Mehdi; Yeow, John T W

    2011-08-12

    We report on the enhancement of electrical cell lysis using carbon nanotubes (CNTs). Electrical cell lysis systems are widely utilized in microchips as they are well suited to integration into lab-on-a-chip devices. However, cell lysis based on electrical mechanisms has high voltage requirements. Here, we demonstrate that by incorporating CNTs into microfluidic electrolysis systems, the required voltage for lysis is reduced by half and the lysis throughput at low voltages is improved by ten times, compared to non-CNT microchips. In our experiment, E. coli cells are lysed while passing through an electric field in a microchannel. Based on the lightning rod effect, the electric field strengthened at the tip of the CNTs enhances cell lysis at lower voltage and higher throughput. This approach enables easy integration of cell lysis with other on-chip high-throughput sample-preparation processes.

  18. IAOseq: inferring abundance of overlapping genes using RNA-seq data.

    PubMed

    Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue

    2015-01-01

    Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.

  19. Transcriptomic Analysis of Paulownia Infected by Paulownia Witches'-Broom Phytoplasma

    PubMed Central

    Zhu, Shui-Fang; Lin, Cai-Li; Tian, Guo-Zhong; Xu, Xia; Zhao, Wen-Jun

    2013-01-01

    Phytoplasmas are plant pathogenic bacteria that have no cell wall and are responsible for major crop losses throughout the world. Phytoplasma-infected plants show a variety of symptoms and the mechanisms they use to physiologically alter the host plants are of considerable interest, but poorly understood. In this study we undertook a detailed analysis of Paulownia infected by Paulownia witches’-broom (PaWB) Phytoplasma using high-throughput mRNA sequencing (RNA-Seq) and digital gene expression (DGE). RNA-Seq analysis identified 74,831 unigenes, which were subsequently used as reference sequences for DGE analysis of diseased and healthy Paulownia in field grown and tissue cultured plants. Our study revealed that dramatic changes occurred in the gene expression profile of Paulownia after PaWB Phytoplasma infection. Genes encoding key enzymes in cytokinin biosynthesis, such as isopentenyl diphosphate isomerase and isopentenyltransferase, were significantly induced in the infected Paulownia. Genes involved in cell wall biosynthesis and degradation were largely up-regulated and genes related to photosynthesis were down-regulated after PaWB Phytoplasma infection. Our systematic analysis provides comprehensive transcriptomic data about plants infected by Phytoplasma. This information will help further our understanding of the detailed interaction mechanisms between plants and Phytoplasma. PMID:24130859

  20. X-ray transparent microfluidic chips for high-throughput screening and optimization of in meso membrane protein crystallization

    PubMed Central

    Schieferstein, Jeremy M.; Pawate, Ashtamurthy S.; Wan, Frank; Sheraden, Paige N.; Broecker, Jana; Ernst, Oliver P.; Gennis, Robert B.

    2017-01-01

    Elucidating and clarifying the function of membrane proteins ultimately requires atomic resolution structures as determined most commonly by X-ray crystallography. Many high impact membrane protein structures have resulted from advanced techniques such as in meso crystallization that present technical difficulties for the set-up and scale-out of high-throughput crystallization experiments. In prior work, we designed a novel, low-throughput X-ray transparent microfluidic device that automated the mixing of protein and lipid by diffusion for in meso crystallization trials. Here, we report X-ray transparent microfluidic devices for high-throughput crystallization screening and optimization that overcome the limitations of scale and demonstrate their application to the crystallization of several membrane proteins. Two complementary chips are presented: (1) a high-throughput screening chip to test 192 crystallization conditions in parallel using as little as 8 nl of membrane protein per well and (2) a crystallization optimization chip to rapidly optimize preliminary crystallization hits through fine-gradient re-screening. We screened three membrane proteins for new in meso crystallization conditions, identifying several preliminary hits that we tested for X-ray diffraction quality. Further, we identified and optimized the crystallization condition for a photosynthetic reaction center mutant and solved its structure to a resolution of 3.5 Å. PMID:28469762

  1. Towards High-throughput Immunomics for Infectious Diseases: Use of Next-generation Peptide Microarrays for Rapid Discovery and Mapping of Antigenic Determinants*

    PubMed Central

    Carmona, Santiago J.; Nielsen, Morten; Schafer-Nielsen, Claus; Mucci, Juan; Altcheh, Jaime; Balouz, Virginia; Tekiel, Valeria; Frasch, Alberto C.; Campetella, Oscar; Buscaglia, Carlos A.; Agüero, Fernán

    2015-01-01

    Complete characterization of antibody specificities associated to natural infections is expected to provide a rich source of serologic biomarkers with potential applications in molecular diagnosis, follow-up of chemotherapeutic treatments, and prioritization of targets for vaccine development. Here, we developed a highly-multiplexed platform based on next-generation high-density peptide microarrays to map these specificities in Chagas Disease, an exemplar of a human infectious disease caused by the protozoan Trypanosoma cruzi. We designed a high-density peptide microarray containing more than 175,000 overlapping 15mer peptides derived from T. cruzi proteins. Peptides were synthesized in situ on microarray slides, spanning the complete length of 457 parasite proteins with fully overlapped 15mers (1 residue shift). Screening of these slides with antibodies purified from infected patients and healthy donors demonstrated both a high technical reproducibility as well as epitope mapping consistency when compared with earlier low-throughput technologies. Using a conservative signal threshold to classify positive (reactive) peptides we identified 2,031 disease-specific peptides and 97 novel parasite antigens, effectively doubling the number of known antigens and providing a 10-fold increase in the number of fine mapped antigenic determinants for this disease. Finally, further analysis of the chip data showed that optimizing the amount of sequence overlap of displayed peptides can increase the protein space covered in a single chip by at least ∼threefold without sacrificing sensitivity. In conclusion, we show the power of high-density peptide chips for the discovery of pathogen-specific linear B-cell epitopes from clinical samples, thus setting the stage for high-throughput biomarker discovery screenings and proteome-wide studies of immune responses against pathogens. PMID:25922409

  2. Towards High-throughput Immunomics for Infectious Diseases: Use of Next-generation Peptide Microarrays for Rapid Discovery and Mapping of Antigenic Determinants.

    PubMed

    Carmona, Santiago J; Nielsen, Morten; Schafer-Nielsen, Claus; Mucci, Juan; Altcheh, Jaime; Balouz, Virginia; Tekiel, Valeria; Frasch, Alberto C; Campetella, Oscar; Buscaglia, Carlos A; Agüero, Fernán

    2015-07-01

    Complete characterization of antibody specificities associated to natural infections is expected to provide a rich source of serologic biomarkers with potential applications in molecular diagnosis, follow-up of chemotherapeutic treatments, and prioritization of targets for vaccine development. Here, we developed a highly-multiplexed platform based on next-generation high-density peptide microarrays to map these specificities in Chagas Disease, an exemplar of a human infectious disease caused by the protozoan Trypanosoma cruzi. We designed a high-density peptide microarray containing more than 175,000 overlapping 15 mer peptides derived from T. cruzi proteins. Peptides were synthesized in situ on microarray slides, spanning the complete length of 457 parasite proteins with fully overlapped 15 mers (1 residue shift). Screening of these slides with antibodies purified from infected patients and healthy donors demonstrated both a high technical reproducibility as well as epitope mapping consistency when compared with earlier low-throughput technologies. Using a conservative signal threshold to classify positive (reactive) peptides we identified 2,031 disease-specific peptides and 97 novel parasite antigens, effectively doubling the number of known antigens and providing a 10-fold increase in the number of fine mapped antigenic determinants for this disease. Finally, further analysis of the chip data showed that optimizing the amount of sequence overlap of displayed peptides can increase the protein space covered in a single chip by at least ∼ threefold without sacrificing sensitivity. In conclusion, we show the power of high-density peptide chips for the discovery of pathogen-specific linear B-cell epitopes from clinical samples, thus setting the stage for high-throughput biomarker discovery screenings and proteome-wide studies of immune responses against pathogens. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  3. Novel Phenotype-Genotype Correlations of Restrictive Cardiomyopathy With Myosin-Binding Protein C (MYBPC3) Gene Mutations Tested by Next-Generation Sequencing.

    PubMed

    Wu, Wei; Lu, Chao-Xia; Wang, Yi-Ning; Liu, Fang; Chen, Wei; Liu, Yong-Tai; Han, Ye-Chen; Cao, Jian; Zhang, Shu-Yang; Zhang, Xue

    2015-07-10

    MYBPC3 dysfunctions have been proven to induce dilated cardiomyopathy, hypertrophic cardiomyopathy, and/or left ventricular noncompaction; however, the genotype-phenotype correlation between MYBPC3 and restrictive cardiomyopathy (RCM) has not been established. The newly developed next-generation sequencing method is capable of broad genomic DNA sequencing with high throughput and can help explore novel correlations between genetic variants and cardiomyopathies. A proband from a multigenerational family with 3 live patients and 1 unrelated patient with clinical diagnoses of RCM underwent a next-generation sequencing workflow based on a custom AmpliSeq panel, including 64 candidate pathogenic genes for cardiomyopathies, on the Ion Personal Genome Machine high-throughput sequencing benchtop instrument. The selected panel contained a total of 64 genes that were reportedly associated with inherited cardiomyopathies. All patients fulfilled strict criteria for RCM with clinical characteristics, echocardiography, and/or cardiac magnetic resonance findings. The multigenerational family with 3 adult RCM patients carried an identical nonsense MYBPC3 mutation, and the unrelated patient carried a missense mutation in the MYBPC3 gene. All of these results were confirmed by the Sanger sequencing method. This study demonstrated that MYBPC3 gene mutations, revealed by next-generation sequencing, were associated with familial and sporadic RCM patients. It is suggested that the next-generation sequencing platform with a selected panel provides a highly efficient approach for molecular diagnosis of hereditary and idiopathic RCM and helps build new genotype-phenotype correlations. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.

  4. multiDE: a dimension reduced model based statistical method for differential expression analysis using RNA-sequencing data with multiple treatment conditions.

    PubMed

    Kang, Guangliang; Du, Li; Zhang, Hong

    2016-06-22

    The growing complexity of biological experiment design based on high-throughput RNA sequencing (RNA-seq) is calling for more accommodative statistical tools. We focus on differential expression (DE) analysis using RNA-seq data in the presence of multiple treatment conditions. We propose a novel method, multiDE, for facilitating DE analysis using RNA-seq read count data with multiple treatment conditions. The read count is assumed to follow a log-linear model incorporating two factors (i.e., condition and gene), where an interaction term is used to quantify the association between gene and condition. The number of the degrees of freedom is reduced to one through the first order decomposition of the interaction, leading to a dramatically power improvement in testing DE genes when the number of conditions is greater than two. In our simulation situations, multiDE outperformed the benchmark methods (i.e. edgeR and DESeq2) even if the underlying model was severely misspecified, and the power gain was increasing in the number of conditions. In the application to two real datasets, multiDE identified more biologically meaningful DE genes than the benchmark methods. An R package implementing multiDE is available publicly at http://homepage.fudan.edu.cn/zhangh/softwares/multiDE . When the number of conditions is two, multiDE performs comparably with the benchmark methods. When the number of conditions is greater than two, multiDE outperforms the benchmark methods.

  5. Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

    PubMed

    Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

    2014-08-16

    Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.

  6. Alternative Oxidase Transcription Factors AOD2 and AOD5 of Neurospora crassa Control the Expression of Genes Involved in Energy Production and Metabolism.

    PubMed

    Qi, Zhigang; Smith, Kristina M; Bredeweg, Erin L; Bosnjak, Natasa; Freitag, Michael; Nargang, Frank E

    2017-02-09

    In Neurospora crassa , blocking the function of the standard mitochondrial electron transport chain results in the induction of an alternative oxidase (AOX). AOX transfers electrons directly from ubiquinol to molecular oxygen. AOX serves as a model of retrograde regulation since it is encoded by a nuclear gene that is regulated in response to signals from mitochondria. The N. crassa transcription factors AOD2 and AOD5 are necessary for the expression of the AOX gene. To gain insight into the mechanism by which these factors function, and to determine if they have roles in the expression of additional genes in N. crassa , we constructed strains expressing only tagged versions of the proteins. Cell fractionation experiments showed that both proteins are localized to the nucleus under both AOX inducing and noninducing conditions. Furthermore, chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) analysis revealed that the proteins are bound to the promoter region of the AOX gene under both conditions. ChIP-seq also showed that the transcription factors bind to the upstream regions of a number of genes that are involved in energy production and metabolism. Dependence on AOD2 and AOD5 for the expression of several of these genes was verified by quantitative PCR. The majority of ChIP-seq peaks observed were enriched for both AOD2 and AOD5. However, we also observed occasional sites where one factor appeared to bind preferentially. The most striking of these was a conserved sequence that bound large amounts of AOD2 but little AOD5. This sequence was found within a 310 bp repeat unit that occurs at several locations in the genome. Copyright © 2017 Qi et al.

  7. The Nano-Patch-Clamp Array: Microfabricated Glass Chips for High-Throughput Electrophysiology

    NASA Astrophysics Data System (ADS)

    Fertig, Niels

    2003-03-01

    Electrophysiology (i.e. patch clamping) remains the gold standard for pharmacological testing of putative ion channel active drugs (ICADs), but suffers from low throughput. A new ion channel screening technology based on microfabricated glass chip devices will be presented. The glass chips contain very fine apertures, which are used for whole-cell voltage clamp recordings as well as single channel recordings from mammalian cell lines. Chips containing multiple patch clamp wells will be used in a first bench-top device, which will allow perfusion and electrical readout of each well. This scalable technology will allow for automated, rapid and parallel screening on ion channel drug targets.

  8. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping.

    PubMed

    Brumme, Chanson J; Poon, Art F Y

    2017-07-15

    Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Viromic Analysis of Wastewater Input to a River Catchment Reveals a Diverse Assemblage of RNA Viruses

    PubMed Central

    Farkas, Kata; Harrison, Christian; Jones, David L.; McCarthy, Alan J.

    2018-01-01

    ABSTRACT Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere’s viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management. PMID:29795788

  10. Viromic Analysis of Wastewater Input to a River Catchment Reveals a Diverse Assemblage of RNA Viruses.

    PubMed

    Adriaenssens, Evelien M; Farkas, Kata; Harrison, Christian; Jones, David L; Allison, Heather E; McCarthy, Alan J

    2018-01-01

    Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere's viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management.

  11. High-throughput analysis of yeast replicative aging using a microfluidic system

    PubMed Central

    Jo, Myeong Chan; Liu, Wei; Gu, Liang; Dang, Weiwei; Qin, Lidong

    2015-01-01

    Saccharomyces cerevisiae has been an important model for studying the molecular mechanisms of aging in eukaryotic cells. However, the laborious and low-throughput methods of current yeast replicative lifespan assays limit their usefulness as a broad genetic screening platform for research on aging. We address this limitation by developing an efficient, high-throughput microfluidic single-cell analysis chip in combination with high-resolution time-lapse microscopy. This innovative design enables, to our knowledge for the first time, the determination of the yeast replicative lifespan in a high-throughput manner. Morphological and phenotypical changes during aging can also be monitored automatically with a much higher throughput than previous microfluidic designs. We demonstrate highly efficient trapping and retention of mother cells, determination of the replicative lifespan, and tracking of yeast cells throughout their entire lifespan. Using the high-resolution and large-scale data generated from the high-throughput yeast aging analysis (HYAA) chips, we investigated particular longevity-related changes in cell morphology and characteristics, including critical cell size, terminal morphology, and protein subcellular localization. In addition, because of the significantly improved retention rate of yeast mother cell, the HYAA-Chip was capable of demonstrating replicative lifespan extension by calorie restriction. PMID:26170317

  12. Dynamic changes in the composition of photosynthetic picoeukaryotes in the northwestern Pacific Ocean revealed by high-throughput tag sequencing of plastid 16S rRNA genes.

    PubMed

    Choi, Dong H; An, Sung M; Chun, Sungjun; Yang, Eun C; Selph, Karen E; Lee, Charity M; Noh, Jae H

    2016-02-01

    Photosynthetic picoeukaryotes (PPEs) are major oceanic primary producers. However, the diversity of such communities remains poorly understood, especially in the northwestern (NW) Pacific. We investigated the abundance and diversity of PPEs, and recorded environmental variables, along a transect from the coast to the open Pacific Ocean. High-throughput tag sequencing (using the MiSeq system) revealed the diversity of plastid 16S rRNA genes. The dominant PPEs changed at the class level along the transect. Prymnesiophyceae were the only dominant PPEs in the warm pool of the NW Pacific, but Mamiellophyceae dominated in coastal waters of the East China Sea. Phylogenetically, most Prymnesiophyceae sequences could not be resolved at lower taxonomic levels because no close relatives have been cultured. Within the Mamiellophyceae, the genera Micromonas and Ostreococcus dominated in marginal coastal areas affected by open water, whereas Bathycoccus dominated in the lower euphotic depths of oligotrophic open waters. Cryptophyceae and Phaeocystis (of the Prymnesiophyceae) dominated in areas affected principally by coastal water. We also defined the biogeographical distributions of Chrysophyceae, prasinophytes, Bacillariophyceaea and Pelagophyceae. These distributions were influenced by temperature, salinity and chlorophyll a and nutrient concentrations. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. A compact imaging spectroscopic system for biomolecular detections on plasmonic chips.

    PubMed

    Lo, Shu-Cheng; Lin, En-Hung; Wei, Pei-Kuen; Tsai, Wan-Shao

    2016-10-17

    In this study, we demonstrate a compact imaging spectroscopic system for high-throughput detection of biomolecular interactions on plasmonic chips, based on a curved grating as the key element of light diffraction and light focusing. Both the curved grating and the plasmonic chips are fabricated on flexible plastic substrates using a gas-assisted thermal-embossing method. A fiber-coupled broadband light source and a camera are included in the system. Spectral resolution within 1 nm is achieved in sensing environmental index solutions and protein bindings. The detected sensitivities of the plasmonic chip are comparable with a commercial spectrometer. An extra one-dimensional scanning stage enables high-throughput detection of protein binding on a designed plasmonic chip consisting of several nanoslit arrays with different periods. The detected resonance wavelengths match well with the grating equation under an air environment. Wavelength shifts between 1 and 9 nm are detected for antigens of various concentrations binding with antibodies. A simple, mass-productive and cost-effective method has been demonstrated on the imaging spectroscopic system for real-time, label-free, highly sensitive and high-throughput screening of biomolecular interactions.

  14. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    PubMed

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  15. ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

    DOE PAGES

    Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...

    2015-09-21

    The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less

  16. ChIP-seq analysis of the σ E regulon of Salmonella enterica serovar typhimurium reveals new genes implicated in heat shock and oxidative stress response

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Jie; Overall, Christopher C.; Johnson, Rudd C.

    The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less

  17. Optimization Of A High-Throughput Transcriptomic (HTTr) Bioactivity Screen In MCF7 Cells Using Targeted RNA-Seq (SOT)

    EPA Science Inventory

    Recent advances in targeted RNA-Seq technology allow researchers to efficiently and cost-effectively obtain whole transcriptome profiles using picograms of mRNA from human cell lysates. Low mRNA input requirements and sample multiplexing capabilities has made time- and concentrat...

  18. From Genes to Protein Mechanics on a Chip

    PubMed Central

    Milles, Lukas F.; Verdorfer, Tobias; Pippig, Diana A.; Nash, Michael A.; Gaub, Hermann E.

    2014-01-01

    Single-molecule force spectroscopy enables mechanical testing of individual proteins, however low experimental throughput limits the ability to screen constructs in parallel. We describe a microfluidic platform for on-chip protein expression and measurement of single-molecule mechanical properties. We constructed microarrays of proteins covalently attached to a chip surface, and found that a single cohesin-modified cantilever that bound to the terminal dockerin-tag of each protein remained stable over thousands of pulling cycles. The ability to synthesize and mechanically probe protein libraries presents new opportunities for high-throughput mechanical phenotyping. PMID:25194847

  19. Microfluidics-Based Lab-on-Chip Systems in DNA-Based Biosensing: An Overview

    PubMed Central

    Dutse, Sabo Wada; Yusof, Nor Azah

    2011-01-01

    Microfluidics-based lab-on-chip (LOC) systems are an active research area that is revolutionising high-throughput sequencing for the fast, sensitive and accurate detection of a variety of pathogens. LOCs also serve as portable diagnostic tools. The devices provide optimum control of nanolitre volumes of fluids and integrate various bioassay operations that allow the devices to rapidly sense pathogenic threat agents for environmental monitoring. LOC systems, such as microfluidic biochips, offer advantages compared to conventional identification procedures that are tedious, expensive and time consuming. This paper aims to provide a broad overview of the need for devices that are easy to operate, sensitive, fast, portable and sufficiently reliable to be used as complementary tools for the control of pathogenic agents that damage the environment. PMID:22163925

  20. A fully sealed plastic chip for multiplex PCR and its application in bacteria identification.

    PubMed

    Xu, Youchun; Yan, He; Zhang, Yan; Jiang, Kewei; Lu, Ying; Ren, Yonghong; Wang, Hui; Wang, Shan; Xing, Wanli

    2015-07-07

    Multiplex PCR is an effective tool for simultaneous multiple target detection but is limited by the intrinsic interference and competition among primer pairs when it is performed in one reaction tube. Dividing a multiplex PCR into many single PCRs is a simple strategy to overcome this issue. Here, we constructed a plastic, easy-to-use, fully sealed multiplex PCR chip based on reversible centrifugation for the simultaneous detection of 63 target DNA sequences. The structure of the chip is quite simple, which contains sine-shaped infusing channels and a number of reaction chambers connecting to one side of these channels. Primer pairs for multiplex PCR were sequentially preloaded in the different reaction chambers, and the chip was enclosed with PCR-compatible adhesive tape. For usage, the PCR master mix containing a DNA template is pipetted into the infusing channels and centrifuged into the reaction chambers, leaving the infusing channels filled with air to avoid cross-contamination of the different chambers. Then, the chip is sealed and placed on a flat thermal cycler for PCR. Finally, amplification products can be detected in situ using a fluorescence scanner or recovered by reverse centrifugation for further analyses. Therefore, our chip possesses two functions: 1) it can be used for multi-target detection based on end-point in situ fluorescence detection; and 2) it can work as a sample preparation unit for analyses that need multiplex PCR such as hybridization and target sequencing. The performance of this chip was carefully examined and further illustrated in the identification of 8 pathogenic bacterial genomic DNA samples and 13 drug-resistance genes. Due to simplicity of its structure and operation, accuracy and generality, high-throughput capacity, and versatile functions (i.e., for in situ detection and sample preparation), our multiplex PCR chip has great potential in clinical diagnostics and nucleic acid-based point-of-care testing.

  1. Fragment assignment in the cloud with eXpress-D

    PubMed Central

    2013-01-01

    Background Probabilistic assignment of ambiguously mapped fragments produced by high-throughput sequencing experiments has been demonstrated to greatly improve accuracy in the analysis of RNA-Seq and ChIP-Seq, and is an essential step in many other sequence census experiments. A maximum likelihood method using the expectation-maximization (EM) algorithm for optimization is commonly used to solve this problem. However, batch EM-based approaches do not scale well with the size of sequencing datasets, which have been increasing dramatically over the past few years. Thus, current approaches to fragment assignment rely on heuristics or approximations for tractability. Results We present an implementation of a distributed EM solution to the fragment assignment problem using Spark, a data analytics framework that can scale by leveraging compute clusters within datacenters–“the cloud”. We demonstrate that our implementation easily scales to billions of sequenced fragments, while providing the exact maximum likelihood assignment of ambiguous fragments. The accuracy of the method is shown to be an improvement over the most widely used tools available and can be run in a constant amount of time when cluster resources are scaled linearly with the amount of input data. Conclusions The cloud offers one solution for the difficulties faced in the analysis of massive high-thoughput sequencing data, which continue to grow rapidly. Researchers in bioinformatics must follow developments in distributed systems–such as new frameworks like Spark–for ways to port existing methods to the cloud and help them scale to the datasets of the future. Our software, eXpress-D, is freely available at: http://github.com/adarob/express-d. PMID:24314033

  2. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  3. Transcriptome analysis using next generation sequencing reveals molecular signatures of diabetic retinopathy and efficacy of candidate drugs.

    PubMed

    Kandpal, Raj P; Rajasimha, Harsha K; Brooks, Matthew J; Nellissery, Jacob; Wan, Jun; Qian, Jiang; Kern, Timothy S; Swaroop, Anand

    2012-01-01

    To define gene expression changes associated with diabetic retinopathy in a mouse model using next generation sequencing, and to utilize transcriptome signatures to assess molecular pathways by which pharmacological agents inhibit diabetic retinopathy. We applied a high throughput RNA sequencing (RNA-seq) strategy using Illumina GAIIx to characterize the entire retinal transcriptome from nondiabetic and from streptozotocin-treated mice 32 weeks after induction of diabetes. Some of the diabetic mice were treated with inhibitors of receptor for advanced glycation endproducts (RAGE) and p38 mitogen activated protein (MAP) kinase, which have previously been shown to inhibit diabetic retinopathy in rodent models. The transcripts and alternatively spliced variants were determined in all experimental groups. Next generation sequencing-based RNA-seq profiles provided comprehensive signatures of transcripts that are altered in early stages of diabetic retinopathy. These transcripts encoded proteins involved in distinct yet physiologically relevant disease-associated pathways such as inflammation, microvasculature formation, apoptosis, glucose metabolism, Wnt signaling, xenobiotic metabolism, and photoreceptor biology. Significant upregulation of crystallin transcripts was observed in diabetic animals, and the diabetes-induced upregulation of these transcripts was inhibited in diabetic animals treated with inhibitors of either RAGE or p38 MAP kinase. These two therapies also showed dissimilar regulation of some subsets of transcripts that included alternatively spliced versions of arrestin, neutral sphingomyelinase activation associated factor (Nsmaf), SH3-domain GRB2-like interacting protein 1 (Sgip1), and axin. Diabetes alters many transcripts in the retina, and two therapies that inhibit the vascular pathology similarly inhibit a portion of these changes, pointing to possible molecular mechanisms for their beneficial effects. These therapies also changed the abundance of various alternatively spliced versions of signaling transcripts, suggesting a possible role of alternative splicing in disease etiology. Our studies clearly demonstrate RNA-seq as a comprehensive strategy for identifying disease-specific transcripts, and for determining comparative profiles of molecular changes mediated by candidate drugs.

  4. High-throughput and site-specific identification of 2'-O-methylation sites using ribose oxidation sequencing (RibOxi-seq).

    PubMed

    Zhu, Yinzhou; Pirnie, Stephan P; Carmichael, Gordon G

    2017-08-01

    Ribose methylation (2'- O -methylation, 2'- O Me) occurs at high frequencies in rRNAs and other small RNAs and is carried out using a shared mechanism across eukaryotes and archaea. As RNA modifications are important for ribosome maturation, and alterations in these modifications are associated with cellular defects and diseases, it is important to characterize the landscape of 2'- O -methylation. Here we report the development of a highly sensitive and accurate method for ribose methylation detection using next-generation sequencing. A key feature of this method is the generation of RNA fragments with random 3'-ends, followed by periodate oxidation of all molecules terminating in 2',3'-OH groups. This allows only RNAs harboring 2'-OMe groups at their 3'-ends to be sequenced. Although currently requiring microgram amounts of starting material, this method is robust for the analysis of rRNAs even at low sequencing depth. © 2017 Zhu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  5. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. DBATE: database of alternative transcripts expression.

    PubMed

    Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2013-01-01

    The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.

  7. Comparative systems biology between human and animal models based on next-generation sequencing methods.

    PubMed

    Zhao, Yu-Qi; Li, Gong-Hua; Huang, Jing-Fei

    2013-04-01

    Animal models provide myriad benefits to both experimental and clinical research. Unfortunately, in many situations, they fall short of expected results or provide contradictory results. In part, this can be the result of traditional molecular biological approaches that are relatively inefficient in elucidating underlying molecular mechanism. To improve the efficacy of animal models, a technological breakthrough is required. The growing availability and application of the high-throughput methods make systematic comparisons between human and animal models easier to perform. In the present study, we introduce the concept of the comparative systems biology, which we define as "comparisons of biological systems in different states or species used to achieve an integrated understanding of life forms with all their characteristic complexity of interactions at multiple levels". Furthermore, we discuss the applications of RNA-seq and ChIP-seq technologies to comparative systems biology between human and animal models and assess the potential applications for this approach in the future studies.

  8. Carbohydrate degrading polypeptide and uses thereof

    DOEpatents

    Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

    2015-10-20

    The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  9. Genome wide transcriptional profiling of Herbaspirillum seropedicae SmR1 grown in the presence of naringenin.

    PubMed

    Tadra-Sfeir, Michelle Z; Faoro, Helisson; Camilios-Neto, Doumit; Brusamarello-Santos, Liziane; Balsanelli, Eduardo; Weiss, Vinicius; Baura, Valter A; Wassem, Roseli; Cruz, Leonardo M; De Oliveira Pedrosa, Fábio; Souza, Emanuel M; Monteiro, Rose A

    2015-01-01

    Herbaspirillum seropedicae is a diazotrophic bacterium which associates endophytically with economically important gramineae. Flavonoids such as naringenin have been shown to have an effect on the interaction between H. seropedicae and its host plants. We used a high-throughput sequencing based method (RNA-Seq) to access the influence of naringenin on the whole transcriptome profile of H. seropedicae. Three hundred and four genes were downregulated and seventy seven were upregulated by naringenin. Data analysis revealed that genes related to bacterial flagella biosynthesis, chemotaxis and biosynthesis of peptidoglycan were repressed by naringenin. Moreover, genes involved in aromatic metabolism and multidrug transport efllux were actived.

  10. The complete mitochondrial genome of Haliotis laevigata (Gastropoda: Haliotidae) using MiSeq and HiSeq sequencing.

    PubMed

    Robinson, Nick A; Hall, Nathan E; Ross, Elizabeth M; Cooke, Ira R; Shiel, Brett P; Robinson, Andrew J; Strugnell, Jan M

    2016-01-01

    The mitochondrial genome of greenlip abalone, Haliotis laevigata, is reported. MiSeq and HiSeq sequencing of one individual was assembled to yield a single 16,545 bp contig. The sequence shares 92% identity to the H. rubra mitochondrial genome (a closely related species that hybridize with H. laevigata in the wild). The sequence will be useful for determining the maternal contribution to hybrid populations, for investigating population structure and stock-enhancement effectiveness.

  11. 19 CFR 12.39 - Imported articles involving unfair methods of competition or practices.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... economically operated United States industry, or to restrain or monopolize trade and commerce in the United... the authorization or consent of the Government. (e) Importations of semiconductor chip products. (1) In accordance with the Semiconductor Chip Protection Act of 1984 (17 U.S.C. 901 et seq.), if the...

  12. [Analysis of SOX10 gene mutation in a family affected with Waardenburg syndrome type II].

    PubMed

    Zheng, Lei; Yan, Yousheng; Chen, Xue; Zhang, Chuan; Zhang, Qinghua; Feng, Xuan; Hao, Shen

    2018-02-10

    OBJECTIVE To detect potential mutation of SOX10 gene in a pedigree affected with Warrdenburg syndrome type II. METHODS Genomic DNA was extracted from peripheral blood samples of the proband and his family members. Exons and flanking sequences of MITF, PAX3, SOX10, SNAI2, END3 and ENDRB genes were analyzed by chip capturing and high throughput sequencing. Suspected mutations were verified with Sanger sequencing. RESULTS A c.127C>T (p.R43X) mutation of the SOX10 gene was detected in the proband, for which both parents showed a wild-type genotype. CONCLUSION The c.127C>T (p.R43X) mutation of SOX10 gene probably underlies the ocular symptoms and hearing loss of the proband.

  13. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

    PubMed Central

    DeMaere, Matthew Z.

    2016-01-01

    Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713

  14. Mapping human pluripotent stem cell differentiation pathways using high throughput single-cell RNA-sequencing.

    PubMed

    Han, Xiaoping; Chen, Haide; Huang, Daosheng; Chen, Huidong; Fei, Lijiang; Cheng, Chen; Huang, He; Yuan, Guo-Cheng; Guo, Guoji

    2018-04-05

    Human pluripotent stem cells (hPSCs) provide powerful models for studying cellular differentiations and unlimited sources of cells for regenerative medicine. However, a comprehensive single-cell level differentiation roadmap for hPSCs has not been achieved. We use high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system. We present a cellular-state landscape for hPSC early differentiation that covers multiple cellular lineages, including neural, muscle, endothelial, stromal, liver, and epithelial cells. Through pseudotime analysis, we construct the developmental trajectories of these progenitor cells and reveal the gene expression dynamics in the process of cell differentiation. We further reprogram primed H9 cells into naïve-like H9 cells to study the cellular-state transition process. We find that genes related to hemogenic endothelium development are enriched in naïve-like H9. Functionally, naïve-like H9 show higher potency for differentiation into hematopoietic lineages than primed cells. Our single-cell analysis reveals the cellular-state landscape of hPSC early differentiation, offering new insights that can be harnessed for optimization of differentiation protocols.

  15. Mediator binding to UASs is broadly uncoupled from transcription and cooperative with TFIID recruitment to promoters.

    PubMed

    Grünberg, Sebastian; Henikoff, Steven; Hahn, Steven; Zentner, Gabriel E

    2016-11-15

    Mediator is a conserved, essential transcriptional coactivator complex, but its in vivo functions have remained unclear due to conflicting data regarding its genome-wide binding pattern obtained by genome-wide ChIP Here, we used ChEC-seq, a method orthogonal to ChIP, to generate a high-resolution map of Mediator binding to the yeast genome. We find that Mediator associates with upstream activating sequences (UASs) rather than the core promoter or gene body under all conditions tested. Mediator occupancy is surprisingly correlated with transcription levels at only a small fraction of genes. Using the same approach to map TFIID, we find that TFIID is associated with both TFIID- and SAGA-dependent genes and that TFIID and Mediator occupancy is cooperative. Our results clarify Mediator recruitment and binding to the genome, showing that Mediator binding to UASs is widespread, partially uncoupled from transcription, and mediated in part by TFIID. © 2016 The Authors.

  16. PCR cycles above routine numbers do not compromise high-throughput DNA barcoding results.

    PubMed

    Vierna, J; Doña, J; Vizcaíno, A; Serrano, D; Jovani, R

    2017-10-01

    High-throughput DNA barcoding has become essential in ecology and evolution, but some technical questions still remain. Increasing the number of PCR cycles above the routine 20-30 cycles is a common practice when working with old-type specimens, which provide little amounts of DNA, or when facing annealing issues with the primers. However, increasing the number of cycles can raise the number of artificial mutations due to polymerase errors. In this work, we sequenced 20 COI libraries in the Illumina MiSeq platform. Libraries were prepared with 40, 45, 50, 55, and 60 PCR cycles from four individuals belonging to four species of four genera of cephalopods. We found no relationship between the number of PCR cycles and the number of mutations despite using a nonproofreading polymerase. Moreover, even when using a high number of PCR cycles, the resulting number of mutations was low enough not to be an issue in the context of high-throughput DNA barcoding (but may still remain an issue in DNA metabarcoding due to chimera formation). We conclude that the common practice of increasing the number of PCR cycles should not negatively impact the outcome of a high-throughput DNA barcoding study in terms of the occurrence of point mutations.

  17. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    PubMed Central

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  18. Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 User Guide

    EPA Science Inventory

    User Guide to describe the complete functionality of the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 online tool. The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility tool (SeqAPASS; https://seqa...

  19. Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

    PubMed

    Liu, Xuejun; Shi, Xinxin; Chen, Chunlin; Zhang, Li

    2015-10-16

    The high-throughput sequencing technology, RNA-Seq, has been widely used to quantify gene and isoform expression in the study of transcriptome in recent years. Accurate expression measurement from the millions or billions of short generated reads is obstructed by difficulties. One is ambiguous mapping of reads to reference transcriptome caused by alternative splicing. This increases the uncertainty in estimating isoform expression. The other is non-uniformity of read distribution along the reference transcriptome due to positional, sequencing, mappability and other undiscovered sources of biases. This violates the uniform assumption of read distribution for many expression calculation approaches, such as the direct RPKM calculation and Poisson-based models. Many methods have been proposed to address these difficulties. Some approaches employ latent variable models to discover the underlying pattern of read sequencing. However, most of these methods make bias correction based on surrounding sequence contents and share the bias models by all genes. They therefore cannot estimate gene- and isoform-specific biases as revealed by recent studies. We propose a latent variable model, NLDMseq, to estimate gene and isoform expression. Our method adopts latent variables to model the unknown isoforms, from which reads originate, and the underlying percentage of multiple spliced variants. The isoform- and exon-specific read sequencing biases are modeled to account for the non-uniformity of read distribution, and are identified by utilizing the replicate information of multiple lanes of a single library run. We employ simulation and real data to verify the performance of our method in terms of accuracy in the calculation of gene and isoform expression. Results show that NLDMseq obtains competitive gene and isoform expression compared to popular alternatives. Finally, the proposed method is applied to the detection of differential expression (DE) to show its usefulness in the downstream analysis. The proposed NLDMseq method provides an approach to accurately estimate gene and isoform expression from RNA-Seq data by modeling the isoform- and exon-specific read sequencing biases. It makes use of a latent variable model to discover the hidden pattern of read sequencing. We have shown that it works well in both simulations and real datasets, and has competitive performance compared to popular methods. The method has been implemented as a freely available software which can be found at https://github.com/PUGEA/NLDMseq.

  20. RNA-Seq Analysis Using De Novo Transcriptome Assembly as a Reference for the Salmon Louse Caligus rogercresseyi

    PubMed Central

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions. PMID:24691066

  1. RNA-Seq analysis using de novo transcriptome assembly as a reference for the salmon louse Caligus rogercresseyi.

    PubMed

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I-II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.

  2. Characterization of limes (Citrus aurantifolia) grown in Bhutan and Indonesia using high-throughput sequencing

    PubMed Central

    Penjor, Tshering; Mimura, Takashi; Matsumoto, Ryoji; Yamamoto, Masashi; Nagano, Yukio

    2014-01-01

    Lime [Citrus aurantifolia (Cristm.) Swingle] is a Citrus species that is a popular ingredient in many cuisines. Some citrus plants are known to originate in the area ranging from northeastern India to southwestern China. In the current study, we characterized and compared limes grown in Bhutan (n = 5 accessions) and Indonesia (n = 3 accessions). The limes were separated into two groups based on their morphology. Restriction site-associated DNA sequencing (RAD-seq) separated the eight accessions into two clusters. One cluster contained four accessions from Bhutan, whereas the other cluster contained one accession from Bhutan and the three accessions from Indonesia. This genetic classification supported the morphological classification of limes. The analysis suggests that the properties associated with asexual reproduction, and somatic homologous recombination, have contributed to the genetic diversification of limes. PMID:24781859

  3. Guidelines for whole genome bisulphite sequencing of intact and FFPET DNA on the Illumina HiSeq X Ten.

    PubMed

    Nair, Shalima S; Luu, Phuc-Loi; Qu, Wenjia; Maddugoda, Madhavi; Huschtscha, Lily; Reddel, Roger; Chenevix-Trench, Georgia; Toso, Martina; Kench, James G; Horvath, Lisa G; Hayes, Vanessa M; Stricker, Phillip D; Hughes, Timothy P; White, Deborah L; Rasko, John E J; Wong, Justin J-L; Clark, Susan J

    2018-05-28

    Comprehensive genome-wide DNA methylation profiling is critical to gain insights into epigenetic reprogramming during development and disease processes. Among the different genome-wide DNA methylation technologies, whole genome bisulphite sequencing (WGBS) is considered the gold standard for assaying genome-wide DNA methylation at single base resolution. However, the high sequencing cost to achieve the optimal depth of coverage limits its application in both basic and clinical research. To achieve 15× coverage of the human methylome, using WGBS, requires approximately three lanes of 100-bp-paired-end Illumina HiSeq 2500 sequencing. It is important, therefore, for advances in sequencing technologies to be developed to enable cost-effective high-coverage sequencing. In this study, we provide an optimised WGBS methodology, from library preparation to sequencing and data processing, to enable 16-20× genome-wide coverage per single lane of HiSeq X Ten, HCS 3.3.76. To process and analyse the data, we developed a WGBS pipeline (METH10X) that is fast and can call SNPs. We performed WGBS on both high-quality intact DNA and degraded DNA from formalin-fixed paraffin-embedded tissue. First, we compared different library preparation methods on the HiSeq 2500 platform to identify the best method for sequencing on the HiSeq X Ten. Second, we optimised the PhiX and genome spike-ins to achieve higher quality and coverage of WGBS data on the HiSeq X Ten. Third, we performed integrated whole genome sequencing (WGS) and WGBS of the same DNA sample in a single lane of HiSeq X Ten to improve data output. Finally, we compared methylation data from the HiSeq 2500 and HiSeq X Ten and found high concordance (Pearson r > 0.9×). Together we provide a systematic, efficient and complete approach to perform and analyse WGBS on the HiSeq X Ten. Our protocol allows for large-scale WGBS studies at reasonable processing time and cost on the HiSeq X Ten platform.

  4. SeqAPASS to evaluate conservation of high-throughput screening targets across non-mammalian species

    EPA Science Inventory

    Cell-based high-throughput screening (HTS) and computational technologies are being applied as tools for toxicity testing in the 21st century. The U.S. Environmental Protection Agency (EPA) embraced these technologies and created the ToxCast Program in 2007, which has served as a...

  5. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets.

    PubMed

    Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L

    2013-07-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.

  6. Microfluidic affinity and ChIP-seq analyses converge on a conserved FOXP2-binding motif in chimp and human, which enables the detection of evolutionarily novel targets

    PubMed Central

    Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.

    2013-01-01

    The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967

  7. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders.

    PubMed

    Moreno-Ramos, Oscar A; Olivares, Ana María; Haider, Neena B; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian-South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding.

  8. Clinical validation of an ultra high-throughput spiral microfluidics for the detection and enrichment of viable circulating tumor cells.

    PubMed

    Khoo, Bee Luan; Warkiani, Majid Ebrahimi; Tan, Daniel Shao-Weng; Bhagat, Ali Asgar S; Irwin, Darryl; Lau, Dawn Pingxi; Lim, Alvin S T; Lim, Kiat Hon; Krisna, Sai Sakktee; Lim, Wan-Teck; Yap, Yoon Sim; Lee, Soo Chin; Soo, Ross A; Han, Jongyoon; Lim, Chwee Teck

    2014-01-01

    Circulating tumor cells (CTCs) are cancer cells that can be isolated via liquid biopsy from blood and can be phenotypically and genetically characterized to provide critical information for guiding cancer treatment. Current analysis of CTCs is hindered by the throughput, selectivity and specificity of devices or assays used in CTC detection and isolation. Here, we enriched and characterized putative CTCs from blood samples of patients with both advanced stage metastatic breast and lung cancers using a novel multiplexed spiral microfluidic chip. This system detected putative CTCs under high sensitivity (100%, n = 56) (Breast cancer samples: 12-1275 CTCs/ml; Lung cancer samples: 10-1535 CTCs/ml) rapidly from clinically relevant blood volumes (7.5 ml under 5 min). Blood samples were completely separated into plasma, CTCs and PBMCs components and each fraction were characterized with immunophenotyping (Pan-cytokeratin/CD45, CD44/CD24, EpCAM), fluorescence in-situ hybridization (FISH) (EML4-ALK) or targeted somatic mutation analysis. We used an ultra-sensitive mass spectrometry based system to highlight the presence of an EGFR-activating mutation in both isolated CTCs and plasma cell-free DNA (cf-DNA), and demonstrate concordance with the original tumor-biopsy samples. We have clinically validated our multiplexed microfluidic chip for the ultra high-throughput, low-cost and label-free enrichment of CTCs. Retrieved cells were unlabeled and viable, enabling potential propagation and real-time downstream analysis using next generation sequencing (NGS) or proteomic analysis.

  9. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    PubMed

    Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R

    2015-01-01

    Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  10. Step-by-Step Construction of Gene Co-expression Networks from High-Throughput Arabidopsis RNA Sequencing Data.

    PubMed

    Contreras-López, Orlando; Moyano, Tomás C; Soto, Daniela C; Gutiérrez, Rodrigo A

    2018-01-01

    The rapid increase in the availability of transcriptomics data generated by RNA sequencing represents both a challenge and an opportunity for biologists without bioinformatics training. The challenge is handling, integrating, and interpreting these data sets. The opportunity is to use this information to generate testable hypothesis to understand molecular mechanisms controlling gene expression and biological processes (Fig. 1). A successful strategy to generate tractable hypotheses from transcriptomics data has been to build undirected network graphs based on patterns of gene co-expression. Many examples of new hypothesis derived from network analyses can be found in the literature, spanning different organisms including plants and specific fields such as root developmental biology.In order to make the process of constructing a gene co-expression network more accessible to biologists, here we provide step-by-step instructions using published RNA-seq experimental data obtained from a public database. Similar strategies have been used in previous studies to advance root developmental biology. This guide includes basic instructions for the operation of widely used open source platforms such as Bio-Linux, R, and Cytoscape. Even though the data we used in this example was obtained from Arabidopsis thaliana, the workflow developed in this guide can be easily adapted to work with RNA-seq data from any organism.

  11. Unearthing microbial diversity of Taxus rhizosphere via MiSeq high-throughput amplicon sequencing and isolate characterization

    PubMed Central

    Hao, Da Cheng; Song, Si Meng; Mu, Jun; Hu, Wen Li; Xiao, Pei Gen

    2016-01-01

    The species variability and potential environmental functions of Taxus rhizosphere microbial community were studied by comparative analyses of 15 16S rRNA and 15 ITS MiSeq sequencing libraries from Taxus rhizospheres in subtropical and temperate regions of China, as well as by isolating laccase-producing strains and polycyclic aromatic hydrocarbon (PAH)-degrading strains. Total reads could be assigned to 2,141 Operational Taxonomic Units (OTUs) belonging to 31 bacteria phyla and 2,904 OTUs of at least seven fungi phyla. The abundance of Planctomycetes, Actinobacteria, and Chloroflexi was higher in T. cuspidata var. nana and T. × media rhizospheres than in T. mairei rhizosphere (NF), while Acidobacteria, Proteobacteria, Nitrospirae, and unclassified bacteria were more abundant in the latter. Ascomycota and Zygomycota were predominant in NF, while two temperate Taxus rhizospheres had more unclassified fungi, Basidiomycota, and Chytridiomycota. The bacterial/fungal community richness and diversity were lower in NF than in other two. Three dye decolorizing fungal isolates were shown to be highly efficient in removing three classes of reactive dye, while two PAH-degrading fungi were able to degrade recalcitrant benzo[a]pyrene. The present studies extend the knowledge pedigree of the microbial diversity populating rhizospheres, and exemplify the method shift in research and development of resource plant rhizosphere. PMID:27080869

  12. Unearthing microbial diversity of Taxus rhizosphere via MiSeq high-throughput amplicon sequencing and isolate characterization

    NASA Astrophysics Data System (ADS)

    Hao, Da Cheng; Song, Si Meng; Mu, Jun; Hu, Wen Li; Xiao, Pei Gen

    2016-04-01

    The species variability and potential environmental functions of Taxus rhizosphere microbial community were studied by comparative analyses of 15 16S rRNA and 15 ITS MiSeq sequencing libraries from Taxus rhizospheres in subtropical and temperate regions of China, as well as by isolating laccase-producing strains and polycyclic aromatic hydrocarbon (PAH)-degrading strains. Total reads could be assigned to 2,141 Operational Taxonomic Units (OTUs) belonging to 31 bacteria phyla and 2,904 OTUs of at least seven fungi phyla. The abundance of Planctomycetes, Actinobacteria, and Chloroflexi was higher in T. cuspidata var. nana and T. × media rhizospheres than in T. mairei rhizosphere (NF), while Acidobacteria, Proteobacteria, Nitrospirae, and unclassified bacteria were more abundant in the latter. Ascomycota and Zygomycota were predominant in NF, while two temperate Taxus rhizospheres had more unclassified fungi, Basidiomycota, and Chytridiomycota. The bacterial/fungal community richness and diversity were lower in NF than in other two. Three dye decolorizing fungal isolates were shown to be highly efficient in removing three classes of reactive dye, while two PAH-degrading fungi were able to degrade recalcitrant benzo[a]pyrene. The present studies extend the knowledge pedigree of the microbial diversity populating rhizospheres, and exemplify the method shift in research and development of resource plant rhizosphere.

  13. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    PubMed

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  14. SAMSA2: a standalone metatranscriptome analysis pipeline.

    PubMed

    Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G

    2018-05-21

    Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.

  15. An innovative SNP genotyping method adapting to multiple platforms and throughputs.

    PubMed

    Long, Y M; Chao, W S; Ma, G J; Xu, S S; Qi, L L

    2017-03-01

    An innovative genotyping method designated as semi-thermal asymmetric reverse PCR (STARP) was developed for genotyping individual SNPs with improved accuracy, flexible throughputs, low operational costs, and high platform compatibility. Multiplex chip-based technology for genome-scale genotyping of single nucleotide polymorphisms (SNPs) has made great progress in the past two decades. However, PCR-based genotyping of individual SNPs still remains problematic in accuracy, throughput, simplicity, and/or operational costs as well as the compatibility with multiple platforms. Here, we report a novel SNP genotyping method designated semi-thermal asymmetric reverse PCR (STARP). In this method, genotyping assay was performed under unique PCR conditions using two universal priming element-adjustable primers (PEA-primers) and one group of three locus-specific primers: two asymmetrically modified allele-specific primers (AMAS-primers) and their common reverse primer. The two AMAS-primers each were substituted one base in different positions at their 3' regions to significantly increase the amplification specificity of the two alleles and tailed at 5' ends to provide priming sites for PEA-primers. The two PEA-primers were developed for common use in all genotyping assays to stringently target the PCR fragments generated by the two AMAS-primers with similar PCR efficiencies and for flexible detection using either gel-free fluorescence signals or gel-based size separation. The state-of-the-art primer design and unique PCR conditions endowed STARP with all the major advantages of high accuracy, flexible throughputs, simple assay design, low operational costs, and platform compatibility. In addition to SNPs, STARP can also be employed in genotyping of indels (insertion-deletion polymorphisms). As vast variations in DNA sequences are being unearthed by many genome sequencing projects and genotyping by sequencing, STARP will have wide applications across all biological organisms in agriculture, medicine, and forensics.

  16. Experimental design and data analysis of Ago-RIP-Seq experiments for the identification of microRNA targets.

    PubMed

    Tichy, Diana; Pickl, Julia Maria Anna; Benner, Axel; Sültmann, Holger

    2017-03-31

    The identification of microRNA (miRNA) target genes is crucial for understanding miRNA function. Many methods for the genome-wide miRNA target identification have been developed in recent years; however, they have several limitations including the dependence on low-confident prediction programs and artificial miRNA manipulations. Ago-RNA immunoprecipitation combined with high-throughput sequencing (Ago-RIP-Seq) is a promising alternative. However, appropriate statistical data analysis algorithms taking into account the experimental design and the inherent noise of such experiments are largely lacking.Here, we investigate the experimental design for Ago-RIP-Seq and examine biostatistical methods to identify de novo miRNA target genes. Statistical approaches considered are either based on a negative binomial model fit to the read count data or applied to transformed data using a normal distribution-based generalized linear model. We compare them by a real data simulation study using plasmode data sets and evaluate the suitability of the approaches to detect true miRNA targets by sensitivity and false discovery rates. Our results suggest that simple approaches like linear regression models on (appropriately) transformed read count data are preferable. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Identification of Development-Related Genes in the Ovaries of Adult Harmonia axyridis (Pallas) Lady Beetles Using a Time- Series Analysis by RNA-seq.

    PubMed

    Du, Wenxiao; Zeng, Fanrong

    2016-12-14

    Adults of the lady beetle species Harmonia axyridis (Pallas) are bred artificially en masse for classic biological control, which requires egg-laying by the H. axyridis ovary. Development-related genes may impact the growth of the H. axyridis adult ovary but have not been reported. Here, we used integrative time-series RNA-seq analysis of the ovary in H. axyridis adults to detect development-related genes. A total of 28,558 unigenes were functionally annotated using seven types of databases to obtain an annotated unigene database for ovaries in H. axyridis adults. We also analysed differentially expressed genes (DEGs) between samples. Based on a combination of the results of this bioinformatics analysis with literature reports and gene expression level changes in four different stages, we focused on the development of oocyte reproductive stem cell and yolk formation process and identified 26 genes with high similarity to development-related genes. 20 DEGs were randomly chosen for quantitative real-time PCR (qRT-PCR) to validate the accuracy of the RNA-seq results. This study establishes a robust pipeline for the discovery of key genes using high-throughput sequencing and the identification of a class of development-related genes for characterization.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulatemore » gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.« less

  19. Robustness encoded across essential and accessory replicons of the ecologically versatile bacterium Sinorhizobium meliloti

    PubMed Central

    Walker, Graham C.; Finan, Turlough M.; Mengoni, Alessio; Griffitts, Joel S.

    2018-01-01

    Bacterial genome evolution is characterized by gains, losses, and rearrangements of functional genetic segments. The extent to which large-scale genomic alterations influence genotype-phenotype relationships has not been investigated in a high-throughput manner. In the symbiotic soil bacterium Sinorhizobium meliloti, the genome is composed of a chromosome and two large extrachromosomal replicons (pSymA and pSymB, which together constitute 45% of the genome). Massively parallel transposon insertion sequencing (Tn-seq) was employed to evaluate the contributions of chromosomal genes to growth fitness in both the presence and absence of these extrachromosomal replicons. Ten percent of chromosomal genes from diverse functional categories are shown to genetically interact with pSymA and pSymB. These results demonstrate the pervasive robustness provided by the extrachromosomal replicons, which is further supported by constraint-based metabolic modeling. A comprehensive picture of core S. meliloti metabolism was generated through a Tn-seq-guided in silico metabolic network reconstruction, producing a core network encompassing 726 genes. This integrated approach facilitated functional assignments for previously uncharacterized genes, while also revealing that Tn-seq alone missed over a quarter of wild-type metabolism. This work highlights the many functional dependencies and epistatic relationships that may arise between bacterial replicons and across a genome, while also demonstrating how Tn-seq and metabolic modeling can be used together to yield insights not obtainable by either method alone. PMID:29672509

  20. Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

    PubMed

    Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

    2018-05-01

    High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

  1. High-volume production of single and compound emulsions in a microfluidic parallelization arrangement coupled with coaxial annular world-to-chip interfaces.

    PubMed

    Nisisako, Takasi; Ando, Takuya; Hatsuzawa, Takeshi

    2012-09-21

    This study describes a microfluidic platform with coaxial annular world-to-chip interfaces for high-throughput production of single and compound emulsion droplets, having controlled sizes and internal compositions. The production module consists of two distinct elements: a planar square chip on which many copies of a microfluidic droplet generator (MFDG) are arranged circularly, and a cubic supporting module with coaxial annular channels for supplying fluids evenly to the inlets of the mounted chip, assembled from blocks with cylinders and holes. Three-dimensional flow was simulated to evaluate the distribution of flow velocity in the coaxial multiple annular channels. By coupling a 1.5 cm × 1.5 cm microfluidic chip with parallelized 144 MFDGs and a supporting module with two annular channels, for example, we could produce simple oil-in-water (O/W) emulsion droplets having a mean diameter of 90.7 μm and a coefficient of variation (CV) of 2.2% at a throughput of 180.0 mL h(-1). Furthermore, we successfully demonstrated high-throughput production of Janus droplets, double emulsions and triple emulsions, by coupling 1.5 cm × 1.5 cm - 4.5 cm × 4.5 cm microfluidic chips with parallelized 32-128 MFDGs of various geometries and supporting modules with 3-4 annular channels.

  2. Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

    2016-02-16

    The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  3. Polypeptide having beta-glucosidase activity and uses thereof

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less

  4. Polypeptide having swollenin activity and uses thereof

    DOEpatents

    Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

    2015-11-04

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  5. Polypeptide having beta-glucosidase activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

    2015-09-01

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  6. Polypeptide having cellobiohydrolase activity and uses thereof

    DOEpatents

    Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

    2015-09-15

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  7. Polypeptide having acetyl xylan esterase activity and uses thereof

    DOEpatents

    Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

    2015-10-20

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  8. Polypeptide having carbohydrate degrading activity and uses thereof

    DOEpatents

    Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

    2015-08-18

    The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.

  9. BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.

    PubMed

    Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun

    2015-01-15

    It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome.

    PubMed

    Allali, Imane; Arnold, Jason W; Roach, Jeffrey; Cadenas, Maria Belen; Butz, Natasha; Hassan, Hosni M; Koci, Matthew; Ballou, Anne; Mendoza, Mary; Ali, Rizwana; Azcarate-Peril, M Andrea

    2017-09-13

    Advancements in Next Generation Sequencing (NGS) technologies regarding throughput, read length and accuracy had a major impact on microbiome research by significantly improving 16S rRNA amplicon sequencing. As rapid improvements in sequencing platforms and new data analysis pipelines are introduced, it is essential to evaluate their capabilities in specific applications. The aim of this study was to assess whether the same project-specific biological conclusions regarding microbiome composition could be reached using different sequencing platforms and bioinformatics pipelines. Chicken cecum microbiome was analyzed by 16S rRNA amplicon sequencing using Illumina MiSeq, Ion Torrent PGM, and Roche 454 GS FLX Titanium platforms, with standard and modified protocols for library preparation. We labeled the bioinformatics pipelines included in our analysis QIIME1 and QIIME2 (de novo OTU picking [not to be confused with QIIME version 2 commonly referred to as QIIME2]), QIIME3 and QIIME4 (open reference OTU picking), UPARSE1 and UPARSE2 (each pair differs only in the use of chimera depletion methods), and DADA2 (for Illumina data only). GS FLX+ yielded the longest reads and highest quality scores, while MiSeq generated the largest number of reads after quality filtering. Declines in quality scores were observed starting at bases 150-199 for GS FLX+ and bases 90-99 for MiSeq. Scores were stable for PGM-generated data. Overall microbiome compositional profiles were comparable between platforms; however, average relative abundance of specific taxa varied depending on sequencing platform, library preparation method, and bioinformatics analysis. Specifically, QIIME with de novo OTU picking yielded the highest number of unique species and alpha diversity was reduced with UPARSE and DADA2 compared to QIIME. The three platforms compared in this study were capable of discriminating samples by treatment, despite differences in diversity and abundance, leading to similar biological conclusions. Our results demonstrate that while there were differences in depth of coverage and phylogenetic diversity, all workflows revealed comparable treatment effects on microbial diversity. To increase reproducibility and reliability and to retain consistency between similar studies, it is important to consider the impact on data quality and relative abundance of taxa when selecting NGS platforms and analysis tools for microbiome studies.

  11. [Complete genome sequencing of polymalic acid-producing strain Aureobasidium pullulans CCTCC M2012223].

    PubMed

    Wang, Yongkang; Song, Xiaodan; Li, Xiaorong; Yang, Sang-tian; Zou, Xiang

    2017-01-04

    To explore the genome sequence of Aureobasidium pullulans CCTCC M2012223, analyze the key genes related to the biosynthesis of important metabolites, and provide genetic background for metabolic engineering. Complete genome of A. pullulans CCTCC M2012223 was sequenced by Illumina HiSeq high throughput sequencing platform. Then, fragment assembly, gene prediction, functional annotation, and GO/COG cluster were analyzed in comparison with those of other five A. pullulans varieties. The complete genome sequence of A. pullulans CCTCC M2012223 was 30756831 bp with an average GC content of 47.49%, and 9452 genes were successfully predicted. Genome-wide analysis showed that A. pullulans CCTCC M2012223 had the biggest genome assembly size. Protein sequences involved in the pullulan and polymalic acid pathway were highly conservative in all of six A. pullulans varieties. Although both A. pullulans CCTCC M2012223 and A. pullulans var. melanogenum have a close affinity, some point mutation and inserts were occurred in protein sequences involved in melanin biosynthesis. Genome information of A. pullulans CCTCC M2012223 was annotated and genes involved in melanin, pullulan and polymalic acid pathway were compared, which would provide a theoretical basis for genetic modification of metabolic pathway in A. pullulans.

  12. Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion

    PubMed Central

    Thomsen, Martin Christen Frølund; Nielsen, Morten

    2012-01-01

    Seq2Logo is a web-based sequence logo generator. Sequence logos are a graphical representation of the information content stored in a multiple sequence alignment (MSA) and provide a compact and highly intuitive representation of the position-specific amino acid composition of binding motifs, active sites, etc. in biological sequences. Accurate generation of sequence logos is often compromised by sequence redundancy and low number of observations. Moreover, most methods available for sequence logo generation focus on displaying the position-specific enrichment of amino acids, discarding the equally valuable information related to amino acid depletion. Seq2logo aims at resolving these issues allowing the user to include sequence weighting to correct for data redundancy, pseudo counts to correct for low number of observations and different logotype representations each capturing different aspects related to amino acid enrichment and depletion. Besides allowing input in the format of peptides and MSA, Seq2Logo accepts input as Blast sequence profiles, providing easy access for non-expert end-users to characterize and identify functionally conserved/variable amino acids in any given protein of interest. The output from the server is a sequence logo and a PSSM. Seq2Logo is available at http://www.cbs.dtu.dk/biotools/Seq2Logo (14 May 2012, date last accessed). PMID:22638583

  13. Combinatorial materials synthesis and high-throughput screening: an integrated materials chip approach to mapping phase diagrams and discovery and optimization of functional materials.

    PubMed

    Xiang, X D

    Combinatorial materials synthesis methods and high-throughput evaluation techniques have been developed to accelerate the process of materials discovery and optimization and phase-diagram mapping. Analogous to integrated circuit chips, integrated materials chips containing thousands of discrete different compositions or continuous phase diagrams, often in the form of high-quality epitaxial thin films, can be fabricated and screened for interesting properties. Microspot x-ray method, various optical measurement techniques, and a novel evanescent microwave microscope have been used to characterize the structural, optical, magnetic, and electrical properties of samples on the materials chips. These techniques are routinely used to discover/optimize and map phase diagrams of ferroelectric, dielectric, optical, magnetic, and superconducting materials.

  14. In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool

    EPA Science Inventory

    The Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to address needs for rapid, cost effective methods of species extrapolation of chemical susceptibility. Specifically, the SeqAPASS tool compares the primary sequence (Level 1), functiona...

  15. Dopamine Signaling Leads to Loss of Polycomb Repression and Aberrant Gene Activation in Experimental Parkinsonism

    PubMed Central

    Lerdrup, Mads; Gomes, Ana-Luisa; Kryh, Hanna; Spigolon, Giada; Caboche, Jocelyne; Fisone, Gilberto; Hansen, Klaus

    2014-01-01

    Polycomb group (PcG) proteins bind to and repress genes in embryonic stem cells through lineage commitment to the terminal differentiated state. PcG repressed genes are commonly characterized by the presence of the epigenetic histone mark H3K27me3, catalyzed by the Polycomb repressive complex 2. Here, we present in vivo evidence for a previously unrecognized plasticity of PcG-repressed genes in terminally differentiated brain neurons of parkisonian mice. We show that acute administration of the dopamine precursor, L-DOPA, induces a remarkable increase in H3K27me3S28 phosphorylation. The induction of the H3K27me3S28p histone mark specifically occurs in medium spiny neurons expressing dopamine D1 receptors and is dependent on Msk1 kinase activity and DARPP-32-mediated inhibition of protein phosphatase-1. Chromatin immunoprecipitation (ChIP) experiments showed that increased H3K27me3S28p was accompanied by reduced PcG binding to regulatory regions of genes. An analysis of the genome wide distribution of L-DOPA-induced H3K27me3S28 phosphorylation by ChIP sequencing (ChIP-seq) in combination with expression analysis by RNA-sequencing (RNA-seq) showed that the induction of H3K27me3S28p correlated with increased expression of a subset of PcG repressed genes. We found that induction of H3K27me3S28p persisted during chronic L-DOPA administration to parkisonian mice and correlated with aberrant gene expression. We propose that dopaminergic transmission can activate PcG repressed genes in the adult brain and thereby contribute to long-term maladaptive responses including the motor complications, or dyskinesia, caused by prolonged administration of L-DOPA in Parkinson's disease. PMID:25254549

  16. PRADA: pipeline for RNA sequencing data analysis.

    PubMed

    Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

    2014-08-01

    Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program.  http://sourceforge.net/projects/prada/  gadgetz@broadinstitute.org or rverhaak@mdanderson.org  Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Ethanol prefermentation of food waste in sequencing batch methane fermentation for improved buffering capacity and microbial community analysis.

    PubMed

    Yu, Miao; Wu, Chuanfu; Wang, Qunhui; Sun, Xiaohong; Ren, Yuanyuan; Li, Yu-You

    2018-01-01

    This study investigates the effects of ethanol prefermentation (EP) on methane fermentation. Yeast was added to the substrate for EP in the sequencing batch methane fermentation of food waste. An Illumina MiSeq high-throughput sequencing system was used to analyze changes in the microbial community. Methane production in the EP group (254mL/g VS) was higher than in the control group (35mL/g VS) because EP not only increased the buffering capacity of the system, but also increased hydrolytic acidification. More carbon source was converted to ethanol in the EP group than in the control group, and neutral ethanol could be converted continuously to acetic acid, which promoted the growth of Methanobacterium and Methanosarcina. As a result, the relative abundance of methane-producing bacteria was significantly higher than that of the control group. Kinetic modeling indicated that the EP group had a higher hydrolysis efficiency and shorter lag phase. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. A universal protocol to generate consensus level genome sequences for foot-and-mouth disease virus and other positive-sense polyadenylated RNA viruses using the Illumina MiSeq.

    PubMed

    Logan, Grace; Freimanis, Graham L; King, David J; Valdazo-González, Begoña; Bachanek-Bankowska, Katarzyna; Sanderson, Nicholas D; Knowles, Nick J; King, Donald P; Cottam, Eleanor M

    2014-09-30

    Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template. The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5' genomic termini and area immediately flanking the poly(C) region. We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.

  19. Genome wide transcriptional profiling of Herbaspirillum seropedicae SmR1 grown in the presence of naringenin

    PubMed Central

    Tadra-Sfeir, Michelle Z.; Faoro, Helisson; Camilios-Neto, Doumit; Brusamarello-Santos, Liziane; Balsanelli, Eduardo; Weiss, Vinicius; Baura, Valter A.; Wassem, Roseli; Cruz, Leonardo M.; De Oliveira Pedrosa, Fábio; Souza, Emanuel M.; Monteiro, Rose A.

    2015-01-01

    Herbaspirillum seropedicae is a diazotrophic bacterium which associates endophytically with economically important gramineae. Flavonoids such as naringenin have been shown to have an effect on the interaction between H. seropedicae and its host plants. We used a high-throughput sequencing based method (RNA-Seq) to access the influence of naringenin on the whole transcriptome profile of H. seropedicae. Three hundred and four genes were downregulated and seventy seven were upregulated by naringenin. Data analysis revealed that genes related to bacterial flagella biosynthesis, chemotaxis and biosynthesis of peptidoglycan were repressed by naringenin. Moreover, genes involved in aromatic metabolism and multidrug transport efllux were actived. PMID:26052319

  20. Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L.

    PubMed Central

    2012-01-01

    Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587

  1. Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L.

    PubMed

    Yang, Huaan; Tao, Ye; Zheng, Zequn; Li, Chengdao; Sweetingham, Mark W; Howieson, John G

    2012-07-17

    In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species.

  2. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.

    PubMed

    O'Leary, Nuala A; Wright, Mathew W; Brister, J Rodney; Ciufo, Stacy; Haddad, Diana; McVeigh, Rich; Rajput, Bhanu; Robbertse, Barbara; Smith-White, Brian; Ako-Adjei, Danso; Astashyn, Alexander; Badretdin, Azat; Bao, Yiming; Blinkova, Olga; Brover, Vyacheslav; Chetvernin, Vyacheslav; Choi, Jinna; Cox, Eric; Ermolaeva, Olga; Farrell, Catherine M; Goldfarb, Tamara; Gupta, Tripti; Haft, Daniel; Hatcher, Eneida; Hlavina, Wratko; Joardar, Vinita S; Kodali, Vamsi K; Li, Wenjun; Maglott, Donna; Masterson, Patrick; McGarvey, Kelly M; Murphy, Michael R; O'Neill, Kathleen; Pujar, Shashikant; Rangwala, Sanjida H; Rausch, Daniel; Riddick, Lillian D; Schoch, Conrad; Shkeda, Andrei; Storz, Susan S; Sun, Hanzhen; Thibaud-Nissen, Francoise; Tolstoy, Igor; Tully, Raymond E; Vatsan, Anjana R; Wallin, Craig; Webb, David; Wu, Wendy; Landrum, Melissa J; Kimchi, Avi; Tatusova, Tatiana; DiCuccio, Michael; Kitts, Paul; Murphy, Terence D; Pruitt, Kim D

    2016-01-04

    The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  3. Metabolic enzyme microarray coupled with miniaturized cell-culture array technology for high-throughput toxicity screening.

    PubMed

    Lee, Moo-Yeal; Dordick, Jonathan S; Clark, Douglas S

    2010-01-01

    Due to poor drug candidate safety profiles that are often identified late in the drug development process, the clinical progression of new chemical entities to pharmaceuticals remains hindered, thus resulting in the high cost of drug discovery. To accelerate the identification of safer drug candidates and improve the clinical progression of drug candidates to pharmaceuticals, it is important to develop high-throughput tools that can provide early-stage predictive toxicology data. In particular, in vitro cell-based systems that can accurately mimic the human in vivo response and predict the impact of drug candidates on human toxicology are needed to accelerate the assessment of drug candidate toxicity and human metabolism earlier in the drug development process. The in vitro techniques that provide a high degree of human toxicity prediction will be perhaps more important in cosmetic and chemical industries in Europe, as animal toxicity testing is being phased out entirely in the immediate future.We have developed a metabolic enzyme microarray (the Metabolizing Enzyme Toxicology Assay Chip, or MetaChip) and a miniaturized three-dimensional (3D) cell-culture array (the Data Analysis Toxicology Assay Chip, or DataChip) for high-throughput toxicity screening of target compounds and their metabolic enzyme-generated products. The human or rat MetaChip contains an array of encapsulated metabolic enzymes that is designed to emulate the metabolic reactions in the human or rat liver. The human or rat DataChip contains an array of 3D human or rat cells encapsulated in alginate gels for cell-based toxicity screening. By combining the DataChip with the complementary MetaChip, in vitro toxicity results are obtained that correlate well with in vivo rat data.

  4. From genes to protein mechanics on a chip.

    PubMed

    Otten, Marcus; Ott, Wolfgang; Jobst, Markus A; Milles, Lukas F; Verdorfer, Tobias; Pippig, Diana A; Nash, Michael A; Gaub, Hermann E

    2014-11-01

    Single-molecule force spectroscopy enables mechanical testing of individual proteins, but low experimental throughput limits the ability to screen constructs in parallel. We describe a microfluidic platform for on-chip expression, covalent surface attachment and measurement of single-molecule protein mechanical properties. A dockerin tag on each protein molecule allowed us to perform thousands of pulling cycles using a single cohesin-modified cantilever. The ability to synthesize and mechanically probe protein libraries enables high-throughput mechanical phenotyping.

  5. ePIANNO: ePIgenomics ANNOtation tool.

    PubMed

    Liu, Chia-Hsin; Ho, Bing-Ching; Chen, Chun-Ling; Chang, Ya-Hsuan; Hsu, Yi-Chiung; Li, Yu-Cheng; Yuan, Shin-Sheng; Huang, Yi-Huan; Chang, Chi-Sheng; Li, Ker-Chau; Chen, Hsuan-Yu

    2016-01-01

    Recently, with the development of next generation sequencing (NGS), the combination of chromatin immunoprecipitation (ChIP) and NGS, namely ChIP-seq, has become a powerful technique to capture potential genomic binding sites of regulatory factors, histone modifications and chromatin accessible regions. For most researchers, additional information including genomic variations on the TF binding site, allele frequency of variation between different populations, variation associated disease, and other neighbour TF binding sites are essential to generate a proper hypothesis or a meaningful conclusion. Many ChIP-seq datasets had been deposited on the public domain to help researchers make new discoveries. However, researches are often intimidated by the complexity of data structure and largeness of data volume. Such information would be more useful if they could be combined or downloaded with ChIP-seq data. To meet such demands, we built a webtool: ePIgenomic ANNOtation tool (ePIANNO, http://epianno.stat.sinica.edu.tw/index.html). ePIANNO is a web server that combines SNP information of populations (1000 Genomes Project) and gene-disease association information of GWAS (NHGRI) with ChIP-seq (hmChIP, ENCODE, and ROADMAP epigenomics) data. ePIANNO has a user-friendly website interface allowing researchers to explore, navigate, and extract data quickly. We use two examples to demonstrate how users could use functions of ePIANNO webserver to explore useful information about TF related genomic variants. Users could use our query functions to search target regions, transcription factors, or annotations. ePIANNO may help users to generate hypothesis or explore potential biological functions for their studies.

  6. Single-cell RNA-sequencing: The future of genome biology is now

    PubMed Central

    Picelli, Simone

    2017-01-01

    ABSTRACT Genome-wide single-cell analysis represents the ultimate frontier of genomics research. In particular, single-cell RNA-sequencing (scRNA-seq) studies have been boosted in the last few years by an explosion of new technologies enabling the study of the transcriptomic landscape of thousands of single cells in complex multicellular organisms. More sensitive and automated methods are being continuously developed and promise to deliver better data quality and higher throughput with less hands-on time. The outstanding amount of knowledge that is going to be gained from present and future studies will have a profound impact in many aspects of our society, from the introduction of truly tailored cancer treatments, to a better understanding of antibiotic resistance and host-pathogen interactions; from the discovery of the mechanisms regulating stem cell differentiation to the characterization of the early event of human embryogenesis. PMID:27442339

  7. Accurate clinical genetic testing for autoinflammatory diseases using the next-generation sequencing platform MiSeq.

    PubMed

    Nakayama, Manabu; Oda, Hirotsugu; Nakagawa, Kenji; Yasumi, Takahiro; Kawai, Tomoki; Izawa, Kazushi; Nishikomori, Ryuta; Heike, Toshio; Ohara, Osamu

    2017-03-01

    Autoinflammatory diseases occupy one of a group of primary immunodeficiency diseases that are generally thought to be caused by mutation of genes responsible for innate immunity, rather than by acquired immunity. Mutations related to autoinflammatory diseases occur in 12 genes. For example, low-level somatic mosaic NLRP3 mutations underlie chronic infantile neurologic, cutaneous, articular syndrome (CINCA), also known as neonatal-onset multisystem inflammatory disease (NOMID). In current clinical practice, clinical genetic testing plays an important role in providing patients with quick, definite diagnoses. To increase the availability of such testing, low-cost high-throughput gene-analysis systems are required, ones that not only have the sensitivity to detect even low-level somatic mosaic mutations, but also can operate simply in a clinical setting. To this end, we developed a simple method that employs two-step tailed PCR and an NGS system, MiSeq platform, to detect mutations in all coding exons of the 12 genes responsible for autoinflammatory diseases. Using this amplicon sequencing system, we amplified a total of 234 amplicons derived from the 12 genes with multiplex PCR. This was done simultaneously and in one test tube. Each sample was distinguished by an index sequence of second PCR primers following PCR amplification. With our procedure and tips for reducing PCR amplification bias, we were able to analyze 12 genes from 25 clinical samples in one MiSeq run. Moreover, with the certified primers designed by our short program-which detects and avoids common SNPs in gene-specific PCR primers-we used this system for routine genetic testing. Our optimized procedure uses a simple protocol, which can easily be followed by virtually any office medical staff. Because of the small PCR amplification bias, we can analyze simultaneously several clinical DNA samples with low cost and can obtain sufficient read numbers to detect a low level of somatic mosaic mutations.

  8. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.

    PubMed

    Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F

    2014-01-01

    We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.

  9. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs.

    PubMed

    Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi

    2018-02-12

    Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.

  10. RNA-Seq for Bacterial Gene Expression.

    PubMed

    Poulsen, Line Dahl; Vinther, Jeppe

    2018-06-01

    RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.

  11. Inferring nucleosome positions with their histone mark annotation from ChIP data

    PubMed Central

    Mammana, Alessandro; Vingron, Martin; Chung, Ho-Ryun

    2013-01-01

    Motivation: The nucleosome is the basic repeating unit of chromatin. It contains two copies each of the four core histones H2A, H2B, H3 and H4 and about 147 bp of DNA. The residues of the histone proteins are subject to numerous post-translational modifications, such as methylation or acetylation. Chromatin immunoprecipitiation followed by sequencing (ChIP-seq) is a technique that provides genome-wide occupancy data of these modified histone proteins, and it requires appropriate computational methods. Results: We present NucHunter, an algorithm that uses the data from ChIP-seq experiments directed against many histone modifications to infer positioned nucleosomes. NucHunter annotates each of these nucleosomes with the intensities of the histone modifications. We demonstrate that these annotations can be used to infer nucleosomal states with distinct correlations to underlying genomic features and chromatin-related processes, such as transcriptional start sites, enhancers, elongation by RNA polymerase II and chromatin-mediated repression. Thus, NucHunter is a versatile tool that can be used to predict positioned nucleosomes from a panel of histone modification ChIP-seq experiments and infer distinct histone modification patterns associated to different chromatin states. Availability: The software is available at http://epigen.molgen.mpg.de/nuchunter/. Contact: chung@molgen.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23981350

  12. A Feature-Based Approach to Modeling Protein–DNA Interactions

    PubMed Central

    Segal, Eran

    2008-01-01

    Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/. PMID:18725950

  13. Kleptoplast Regulation by an Antarctic Dinoflagellate

    NASA Astrophysics Data System (ADS)

    Gast, R. J.; Hehenberger, E.; Keeling, P.

    2016-02-01

    We are studying the evolutionary history and expression of plastid- targeted genes in an Antarctic dinoflagellate that steals chloroplasts from the haptophyte, Phaeocystis. Our project seeks to determine whether the kleptoplastidic dinoflagellate utilizes ancestral plastid proteins to regulate its stolen plastid, and how their transcription is related to environmental factors that are relevant to the Southern Ocean environment (temperature and light). To accomplish our goals, we have utilized high throughput transciptome analysis and RNA-Seq experiments of the dinoflagellate and Phaeocystis. Analysis of the dinoflagellate transcriptome has revealed complete mevalonic acid-independent and heme plastid-associated pathways as well as petF and petH transcripts with peridinin-plastid targeting sequences. In contrast, the proteins psaE, petJ, petC show similarity to non-Phaeocystis haptophyte homologs in their respective trees, and potentially carry haptophyte transit peptides. Anaylsis of RNA-Seq temperature and light experiments for the dinoflagellate indicate that there are significant differences in gene expression under the different environmental conditions, and we are in the process of identifying the genes associated with these changes. This work will help us to understand the environmental success of this alternative nutritional strategy.

  14. ASPeak: an abundance sensitive peak detection algorithm for RIP-Seq.

    PubMed

    Kucukural, Alper; Özadam, Hakan; Singh, Guramrit; Moore, Melissa J; Cenik, Can

    2013-10-01

    Unlike DNA, RNA abundances can vary over several orders of magnitude. Thus, identification of RNA-protein binding sites from high-throughput sequencing data presents unique challenges. Although peak identification in ChIP-Seq data has been extensively explored, there are few bioinformatics tools tailored for peak calling on analogous datasets for RNA-binding proteins. Here we describe ASPeak (abundance sensitive peak detection algorithm), an implementation of an algorithm that we previously applied to detect peaks in exon junction complex RNA immunoprecipitation in tandem experiments. Our peak detection algorithm yields stringent and robust target sets enabling sensitive motif finding and downstream functional analyses. ASPeak is implemented in Perl as a complete pipeline that takes bedGraph files as input. ASPeak implementation is freely available at https://sourceforge.net/projects/as-peak under the GNU General Public License. ASPeak can be run on a personal computer, yet is designed to be easily parallelizable. ASPeak can also run on high performance computing clusters providing efficient speedup. The documentation and user manual can be obtained from http://master.dl.sourceforge.net/project/as-peak/manual.pdf.

  15. Epigenomic Landscape of Human Fetal Brain, Heart, and Liver.

    PubMed

    Yan, Liying; Guo, Hongshan; Hu, Boqiang; Li, Rong; Yong, Jun; Zhao, Yangyu; Zhi, Xu; Fan, Xiaoying; Guo, Fan; Wang, Xiaoye; Wang, Wei; Wei, Yuan; Wang, Yan; Wen, Lu; Qiao, Jie; Tang, Fuchou

    2016-02-26

    The epigenetic regulation of spatiotemporal gene expression is crucial for human development. Here, we present whole-genome chromatin immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) analyses of a wide variety of histone markers in the brain, heart, and liver of early human embryos shortly after their formation. We identified 40,181 active enhancers, with a large portion showing tissue-specific and developmental stage-specific patterns, pointing to their roles in controlling the ordered spatiotemporal expression of the developmental genes in early human embryos. Moreover, using sequential ChIP-seq, we showed that all three organs have hundreds to thousands of bivalent domains that are marked by both H3K4me3 and H3K27me3, probably to keep the progenitor cells in these organs ready for immediate differentiation into diverse cell types during subsequent developmental processes. Our work illustrates the potentially critical roles of tissue-specific and developmental stage-specific epigenomes in regulating the spatiotemporal expression of developmental genes during early human embryonic development. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  16. Pervasive Targeting of Nascent Transcripts by Hfq.

    PubMed

    Kambara, Tracy K; Ramsey, Kathryn M; Dove, Simon L

    2018-05-01

    Hfq is an RNA chaperone and an important post-transcriptional regulator in bacteria. Using chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq), we show that Hfq associates with hundreds of different regions of the Pseudomonas aeruginosa chromosome. These associations are abolished when transcription is inhibited, indicating that they reflect Hfq binding to transcripts during their synthesis. Analogous ChIP-seq analyses with the post-transcriptional regulator Crc reveal that it associates with many of the same nascent transcripts as Hfq, an activity we show is Hfq dependent. Our findings indicate that Hfq binds many transcripts co-transcriptionally in P. aeruginosa, often in concert with Crc, and uncover direct regulatory targets of these proteins. They also highlight a general approach for studying the interactions of RNA-binding proteins with nascent transcripts in bacteria. The binding of post-transcriptional regulators to nascent mRNAs may represent a prevalent means of controlling translation in bacteria where transcription and translation are coupled. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Comparison of Next-Generation Sequencing Systems

    PubMed Central

    Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

    2012-01-01

    With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

  18. In vivo Monitoring of Transcriptional Dynamics After Lower-Limb Muscle Injury Enables Quantitative Classification of Healing

    PubMed Central

    Aguilar, Carlos A.; Shcherbina, Anna; Ricke, Darrell O.; Pop, Ramona; Carrigan, Christopher T.; Gifford, Casey A.; Urso, Maria L.; Kottke, Melissa A.; Meissner, Alexander

    2015-01-01

    Traumatic lower-limb musculoskeletal injuries are pervasive amongst athletes and the military and typically an individual returns to activity prior to fully healing, increasing a predisposition for additional injuries and chronic pain. Monitoring healing progression after a musculoskeletal injury typically involves different types of imaging but these approaches suffer from several disadvantages. Isolating and profiling transcripts from the injured site would abrogate these shortcomings and provide enumerative insights into the regenerative potential of an individual’s muscle after injury. In this study, a traumatic injury was administered to a mouse model and healing progression was examined from 3 hours to 1 month using high-throughput RNA-Sequencing (RNA-Seq). Comprehensive dissection of the genome-wide datasets revealed the injured site to be a dynamic, heterogeneous environment composed of multiple cell types and thousands of genes undergoing significant expression changes in highly regulated networks. Four independent approaches were used to determine the set of genes, isoforms, and genetic pathways most characteristic of different time points post-injury and two novel approaches were developed to classify injured tissues at different time points. These results highlight the possibility to quantitatively track healing progression in situ via transcript profiling using high- throughput sequencing. PMID:26381351

  19. Transcriptomic Crosstalk between Fungal Invasive Pathogens and Their Host Cells: Opportunities and Challenges for Next-Generation Sequencing Methods

    PubMed Central

    Enguita, Francisco J.; Costa, Marina C.; Fusco-Almeida, Ana Marisa; Mendes-Giannini, Maria José; Leitão, Ana Lúcia

    2016-01-01

    Fungal invasive infections are an increasing health problem. The intrinsic complexity of pathogenic fungi and the unmet clinical need for new and more effective treatments requires a detailed knowledge of the infection process. During infection, fungal pathogens are able to trigger a specific transcriptional program in their host cells. The detailed knowledge of this transcriptional program will allow for a better understanding of the infection process and consequently will help in the future design of more efficient therapeutic strategies. Simultaneous transcriptomic studies of pathogen and host by high-throughput sequencing (dual RNA-seq) is an unbiased protocol to understand the intricate regulatory networks underlying the infectious process. This protocol is starting to be applied to the study of the interactions between fungal pathogens and their hosts. To date, our knowledge of the molecular basis of infection for fungal pathogens is still very limited, and the putative role of regulatory players such as non-coding RNAs or epigenetic factors remains elusive. The wider application of high-throughput transcriptomics in the near future will help to understand the fungal mechanisms for colonization and survival, as well as to characterize the molecular responses of the host cell against a fungal infection. PMID:29376924

  20. 32 x 16 CMOS smart pixel array for optical interconnects

    NASA Astrophysics Data System (ADS)

    Kim, Jongwoo; Guilfoyle, Peter S.; Stone, Richard V.; Hessenbruch, John M.; Choquette, Kent D.; Kiamilev, Fouad E.

    2000-05-01

    Free space optical interconnects can increase throughput capacities and eliminate much of the energy consumption required for `all electronic' systems. High speed optical interconnects can be achieved by integrating optoelectronic devices with conventional electronics. Smart pixel arrays have been developed which use optical interconnects. An individual smart pixel cell is composed of a vertical cavity surface emitting laser (VCSEL), a photodetector, an optical receiver, a laser driver, and digital logic circuitry. Oxide-confined VCSELs are being developed to operate at 850 nm with a threshold current of approximately 1 mA. Multiple quantum well photodetectors are being fabricated from AlGaAs for use with the 850 nm VCSELs. The VCSELs and photodetectors are being integrated with complementary metal oxide semiconductor (CMOS) circuitry using flip-chip bonding. CMOS circuitry is being integrated with a 32 X 16 smart pixel array. The 512 smart pixels are serially linked. Thus, an entire data stream may be clocked through the chip and output electrically by the last pixel. Electrical testing is being performed on the CMOS smart pixel array. Using an on-chip pseudo random number generator, a digital data sequence was cycled through the chip verifying operation of the digital circuitry. Although, the prototype chip was fabricated in 1.2 micrometers technology, simulations have demonstrated that the array can operate at 1 Gb/s per pixel using 0.5 micrometers technology.

  1. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders

    PubMed Central

    Moreno-Ramos, Oscar A.; Olivares, Ana María; Haider, Neena B.; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian—South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding. PMID:26352270

  2. QR-on-a-chip: a computer-recognizable micro-pattern engraved microfluidic device for high-throughput image acquisition.

    PubMed

    Yun, Kyungwon; Lee, Hyunjae; Bang, Hyunwoo; Jeon, Noo Li

    2016-02-21

    This study proposes a novel way to achieve high-throughput image acquisition based on a computer-recognizable micro-pattern implemented on a microfluidic device. We integrated the QR code, a two-dimensional barcode system, onto the microfluidic device to simplify imaging of multiple ROIs (regions of interest). A standard QR code pattern was modified to arrays of cylindrical structures of polydimethylsiloxane (PDMS). Utilizing the recognition of the micro-pattern, the proposed system enables: (1) device identification, which allows referencing additional information of the device, such as device imaging sequences or the ROIs and (2) composing a coordinate system for an arbitrarily located microfluidic device with respect to the stage. Based on these functionalities, the proposed method performs one-step high-throughput imaging for data acquisition in microfluidic devices without further manual exploration and locating of the desired ROIs. In our experience, the proposed method significantly reduced the time for the preparation of an acquisition. We expect that the method will innovatively improve the prototype device data acquisition and analysis.

  3. Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology: An Introduction to the Symposium.

    PubMed

    Mykles, Donald L; Burnett, Karen G; Durica, David S; Stillman, Jonathon H

    2016-12-01

    Crustaceans, and decapods in particular (i.e., crabs, shrimp, and lobsters), are a diverse and ecologically and commercially important group of organisms. Understanding responses to abiotic and biotic factors is critical for developing best practices in aquaculture and assessing the effects of changing environments on the biology of these important animals. A relatively small number of decapod crustacean species have been intensively studied at the molecular level; the availability, experimental tractability, and economic relevance factor into the selection of a particular species as a model. Transcriptomics, using high-throughput next generation sequencing (NGS, coupled with RNA sequencing or RNA-seq) is revolutionizing crustacean biology. The 11 symposium papers in this volume illustrate how RNA-seq is being used to study stress response, molting and limb regeneration, immunity and disease, reproduction and development, neurobiology, and ecology and evolution. This symposium occurred on the 10th anniversary of the symposium, "Genomic and Proteomic Approaches to Crustacean Biology", held at the Society for Integrative and Comparative Biology 2006 meeting. Two participants in the 2006 symposium, the late Paul Gross and David Towle, were recognized as leaders who pioneered the use of molecular techniques that would ultimately foster the transcriptomics research reviewed in this volume. RNA-seq is a powerful tool for hypothesis-driven research, as well as an engine for discovery. It has eclipsed the technologies available in 2006, such as microarrays, expressed sequence tags, and subtractive hybridization screening, as the millions of "reads" from NGS enable researchers to de novo assemble a comprehensive transcriptome without a complete genome sequence. The symposium series concludes with a policy paper that gives an overview of the resources available and makes recommendations for developing better tools for functional annotation and pathway and network analysis in organisms in which the genome is not available or is incomplete. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  4. Characterization of relative abundance of lactic acid bacteria species in French organic sourdough by cultural, qPCR and MiSeq high-throughput sequencing methods.

    PubMed

    Michel, Elisa; Monfort, Clarisse; Deffrasnes, Marion; Guezenec, Stéphane; Lhomme, Emilie; Barret, Matthieu; Sicard, Delphine; Dousset, Xavier; Onno, Bernard

    2016-12-19

    In order to contribute to the description of sourdough LAB composition, MiSeq sequencing and qPCR methods were performed in association with cultural methods. A panel of 16 French organic bakers and farmer-bakers were selected for this work. The lactic acid bacteria (LAB) diversity of their organic sourdoughs was investigated quantitatively and qualitatively combining (i) Lactobacillus sanfranciscensis-specific qPCR, (ii) global sequencing with MiSeq Illumina technology and (iii) molecular isolates identification. In addition, LAB and yeast enumeration, pH, Total Titratable Acidity, organic acids and bread specific volume were analyzed. Microbial and physico-chemical data were statistically treated by Principal Component Analysis (PCA) and Hierarchical Ascendant Classification (HAC). Total yeast counts were 6 log 10 to 7.6 log 10 CFU/g while LAB counts varied from 7.2 log 10 to 9.6 log 10 CFU/g. Values obtained by L. sanfranciscensis-specific qPCR were estimated between 7.2 and 10.3 log 10 CFU/g, except for one sample at 4.4 log 10 CFU/g. HAC and PCA clustered the sixteen sourdoughs into three classes described by their variables but without links to bakers' practices. L. sanfranciscensis was the dominant species in 13 of the 16 sourdoughs analyzed by Next Generation Sequencing (NGS), by the culture dependent method this species was dominant only in only 10 samples. Based on isolates identification, LAB diversity was higher for 7 sourdoughs with the recovery of L. curvatus, L. brevis, L. heilongjiangensis, L. xiangfangensis, L. koreensis, L. pontis, Weissella sp. and Pediococcus pentosaceus, as the most representative species. L. koreensis, L. heilongjiangensis and L. xiangfangensis were identified in traditional Asian food and here for the first time as dominant in organic sourdough. This study highlighted that L. sanfranciscensis was not the major species in 6/16 sourdough samples and that a relatively high LAB diversity can be observed in French organic sourdough. Copyright © 2016. Published by Elsevier B.V.

  5. NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples.

    PubMed

    Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei

    2018-01-01

    Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.

  6. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    PubMed

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  7. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2014-12-01

    displacement amplification (MDA). 2 adopted multiple displacement amplification (MDA) with random primers for enriched circular DNA by rolling circle ... amplification (RCA) (Fig. 1) and then amplified DNA fragments were subject to deep sequencing. Sequence NO of Reads seq 1 184 seq 2 133 seq 3 2407 seq...prostate cancer cells through multiple displacement amplification .  Clone #7 is the top candidate which has been cloned in an expression vector and it

  8. De novo assembly of the Japanese lawngrass (Zoysia japonica Steud.) root transcriptome and identification of candidate unigenes related to early responses under salt stress

    PubMed Central

    Xie, Qi; Niu, Jun; Xu, Xilin; Xu, Lixin; Zhang, Yinbing; Fan, Bo; Liang, Xiaohong; Zhang, Lijuan; Yin, Shuxia; Han, Liebao

    2015-01-01

    Japanese lawngrass (Zoysia japonica Steud.) is an important warm-season turfgrass that is able to survive in a range of soils, from infertile sands to clays, and to grow well under saline conditions. However, little is known about the molecular mechanisms involved in its resistance to salt stress. Here, we used high-throughput RNA sequencing (RNA-seq) to investigate the changes in gene expression of Zoysia grass at high NaCl concentrations. We first constructed two sequencing libraries, including control and NaCl-treated samples, and sequenced them using the Illumina HiSeq™ 2000 platform. Approximately 157.20 million paired-end reads with a total length of 68.68 Mb were obtained. Subsequently, 32,849 unigenes with an N50 length of 1781 bp were assembled using Trinity. Furthermore, three public databases, the Kyoto Encyclopedia of Genes and Genomes (KEGG), Swiss-prot, and Clusters of Orthologous Groups (COGs), were used for gene function analysis and enrichment. The annotated genes included 57 Gene Ontology (GO) terms, 120 KEGG pathways, and 24 COGs. Compared with the control, 1455 genes were significantly different (false discovery rate ≤0.01, |log2Ratio |≥1) in the NaCl-treated samples. These genes were enriched in 10 KEGG pathways and 73 GO terms, and subjected to 25 COG categories. Using high-throughput next-generation sequencing, we built a database as a global transcript resource for Z. japonica Steud. roots. The results of this study will advance our understanding of the early salt response in Japanese lawngrass roots. PMID:26347751

  9. Fungal diversity in deep-sea sediments of a hydrothermal vent system in the Southwest Indian Ridge

    NASA Astrophysics Data System (ADS)

    Xu, Wei; Gong, Lin-feng; Pang, Ka-Lai; Luo, Zhu-Hua

    2018-01-01

    Deep-sea hydrothermal sediment is known to support remarkably diverse microbial consortia. In deep sea environments, fungal communities remain less studied despite their known taxonomic and functional diversity. High-throughput sequencing methods have augmented our capacity to assess eukaryotic diversity and their functions in microbial ecology. Here we provide the first description of the fungal community diversity found in deep sea sediments collected at the Southwest Indian Ridge (SWIR) using culture-dependent and high-throughput sequencing approaches. A total of 138 fungal isolates were cultured from seven different sediment samples using various nutrient media, and these isolates were identified to 14 fungal taxa, including 11 Ascomycota taxa (7 genera) and 3 Basidiomycota taxa (2 genera) based on internal transcribed spacers (ITS1, ITS2 and 5.8S) of rDNA. Using illumina HiSeq sequencing, a total of 757,467 fungal ITS2 tags were recovered from the samples and clustered into 723 operational taxonomic units (OTUs) belonging to 79 taxa (Ascomycota and Basidiomycota contributed to 99% of all samples) based on 97% sequence similarity. Results from both approaches suggest that there is a high fungal diversity in the deep-sea sediments collected in the SWIR and fungal communities were shown to be slightly different by location, although all were collected from adjacent sites at the SWIR. This study provides baseline data of the fungal diversity and biogeography, and a glimpse to the microbial ecology associated with the deep-sea sediments of the hydrothermal vent system of the Southwest Indian Ridge.

  10. Additional annotation of the pig transcriptome using integrated Iso-seq and Illumina RNA-seq analysis

    USDA-ARS?s Scientific Manuscript database

    Alternative splicing is a well-known phenomenon that dramatically increases eukaryotic transcriptome diversity. The extent of mRNA isoform diversity among porcine tissues was assessed using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) and Illumina short read sequencing ...

  11. The emerging genomics and systems biology research lead to systems genomics studies.

    PubMed

    Yang, Mary Qu; Yoshigoe, Kenji; Yang, William; Tong, Weida; Qin, Xiang; Dunker, A; Chen, Zhongxue; Arbania, Hamid R; Liu, Jun S; Niemierko, Andrzej; Yang, Jack Y

    2014-01-01

    Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.

  12. Prunus transcription factors: breeding perspectives

    PubMed Central

    Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro

    2015-01-01

    Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770

  13. A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.

    PubMed

    Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex

    2018-04-26

    The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous RNA; Mt rRNA: mitochondrial ribosomal RNA; Mt tRNA: mitochondrial transference RNA; OAZ3: ornithine decarboxylase antizyme 3; ORT: osmotic resistance test; piRNA: Piwi-interacting RNA; PRM1: protamine 1; PTPRC: protein tyrosine phosphatase receptor type C; rRNA: ribosomal RNA; snoRNA: small nucleolar RNA; snRNA: small nuclear RNA; SRR: sperm recovery rate; tRNA: transfer RNA.

  14. The development of a general purpose ARM-based processing unit for the ATLAS TileCal sROD

    NASA Astrophysics Data System (ADS)

    Cox, M. A.; Reed, R.; Mellado, B.

    2015-01-01

    After Phase-II upgrades in 2022, the data output from the LHC ATLAS Tile Calorimeter will increase significantly. ARM processors are common in mobile devices due to their low cost, low energy consumption and high performance. It is proposed that a cost-effective, high data throughput Processing Unit (PU) can be developed by using several consumer ARM processors in a cluster configuration to allow aggregated processing performance and data throughput while maintaining minimal software design difficulty for the end-user. This PU could be used for a variety of high-level functions on the high-throughput raw data such as spectral analysis and histograms to detect possible issues in the detector at a low level. High-throughput I/O interfaces are not typical in consumer ARM System on Chips but high data throughput capabilities are feasible via the novel use of PCI-Express as the I/O interface to the ARM processors. An overview of the PU is given and the results for performance and throughput testing of four different ARM Cortex System on Chips are presented.

  15. Human jagged polypeptide, encoding nucleic acids and methods of use

    DOEpatents

    Li, Linheng; Hood, Leroy

    2000-01-01

    The present invention provides an isolated polypeptide exhibiting substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the polypeptide does not have the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. The invention further provides an isolated nucleic acid molecule containing a nucleotide sequence encoding substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the nucleotide sequence does not encode the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. Also provided herein is a method of inhibiting differentiation of hematopoietic progenitor cells by contacting the progenitor cells with an isolated JAGGED polypeptide, or active fragment thereof. The invention additionally provides a method of diagnosing Alagille Syndrome in an individual. The method consists of detecting an Alagille Syndrome disease-associated mutation linked to a JAGGED locus.

  16. Methods of diagnosing alagille syndrome

    DOEpatents

    Li, Linheng; Hood, Leroy; Krantz, Ian D.; Spinner, Nancy B.

    2004-03-09

    The present invention provides an isolated polypeptide exhibiting substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the polypeptide does not have the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. The invention further provides an isolated nucleic acid molecule containing a nucleotide sequence encoding substantially the same amino acid sequence as JAGGED, or an active fragment thereof, provided that the nucleotide sequence does not encode the amino acid sequence of SEQ ID NO:5 or SEQ ID NO:6. Also provided herein is a method of inhibiting differentiation of hematopoietic progenitor cells by contacting the progenitor cells with an isolated JAGGED polypeptide, or active fragment thereof. The invention additionally provides a method of diagnosing Alagille Syndrome in an individual. The method consists of detecting an Alagille Syndrome disease-associated mutation linked to a JAGGED locus.

  17. A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes

    PubMed Central

    Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander

    2015-01-01

    High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305

  18. Robustness of Massively Parallel Sequencing Platforms

    PubMed Central

    Kavak, Pınar; Yüksel, Bayram; Aksu, Soner; Kulekci, M. Oguzhan; Güngör, Tunga; Hach, Faraz; Şahinalp, S. Cenk; Alkan, Can; Sağıroğlu, Mahmut Şamil

    2015-01-01

    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications. PMID:26382624

  19. High-Throughput Sequencing and Copy Number Variation Detection Using Formalin Fixed Embedded Tissue in Metastatic Gastric Cancer

    PubMed Central

    Hong, Min Eui; Do, In-Gu; Kang, So Young; Ha, Sang Yun; Kim, Seung Tae; Park, Se Hoon; Kang, Won Ki; Choi, Min-Gew; Lee, Jun Ho; Sohn, Tae Sung; Bae, Jae Moon; Kim, Sung; Kim, Duk-Hwan; Kim, Kyoung-Mee

    2014-01-01

    In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%), APC (10.1%), PIK3CA (5.6%), KRAS (4.5%), SMO (3.4%), STK11 (3.4%), CDKN2A (3.4%) and SMAD4 (3.4%). Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%), 4 (4.5%), 2 (2.2%), 1 (1.1%) and 1 (1.1%) cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes. PMID:25372287

  20. DNA Microarray for Rapid Detection and Identification of Food and Water Borne Bacteria: From Dry to Wet Lab.

    PubMed

    Ranjbar, Reza; Behzadi, Payam; Najafi, Ali; Roudi, Raheleh

    2017-01-01

    A rapid, accurate, flexible and reliable diagnostic method may significantly decrease the costs of diagnosis and treatment. Designing an appropriate microarray chip reduces noises and probable biases in the final result. The aim of this study was to design and construct a DNA Microarray Chip for a rapid detection and identification of 10 important bacterial agents. In the present survey, 10 unique genomic regions relating to 10 pathogenic bacterial agents including Escherichia coli (E.coli), Shigella boydii, Sh.dysenteriae, Sh.flexneri, Sh.sonnei, Salmonella typhi, S.typhimurium, Brucella sp., Legionella pneumophila, and Vibrio cholera were selected for designing specific long oligo microarray probes. For this reason, the in-silico operations including utilization of the NCBI RefSeq database, Servers of PanSeq and Gview, AlleleID 7.7 and Oligo Analyzer 3.1 was done. On the other hand, the in-vitro part of the study comprised stages of robotic microarray chip probe spotting, bacterial DNAs extraction and DNA labeling, hybridization and microarray chip scanning. In wet lab section, different tools and apparatus such as Nexterion® Slide E, Qarray mini spotter, NimbleGen kit, TrayMix TM S4, and Innoscan 710 were used. A DNA microarray chip including 10 long oligo microarray probes was designed and constructed for detection and identification of 10 pathogenic bacteria. The DNA microarray chip was capable to identify all 10 bacterial agents tested simultaneously. The presence of a professional bioinformatician as a probe designer is needed to design appropriate multifunctional microarray probes to increase the accuracy of the outcomes.

  1. A weighted U-statistic for genetic association analyses of sequencing data.

    PubMed

    Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

    2014-12-01

    With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.

  2. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

    PubMed

    Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M

    2015-04-01

    High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.

  3. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

    PubMed

    Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong

    2012-07-24

    Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  4. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis.

    PubMed

    Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich

    2015-12-16

    Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.

  5. iSeq: Web-Based RNA-seq Data Analysis and Visualization.

    PubMed

    Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng

    2018-01-01

    Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .

  6. Temporal dynamics of soil microbial communities under different moisture regimes: high-throughput sequencing and bioinformatics analysis

    NASA Astrophysics Data System (ADS)

    Semenov, Mikhail; Zhuravleva, Anna; Semenov, Vyacheslav; Yevdokimov, Ilya; Larionova, Alla

    2017-04-01

    Recent climate scenarios predict not only continued global warming but also an increased frequency and intensity of extreme climatic events such as strong changes in temperature and precipitation regimes. Microorganisms are well known to be more sensitive to changes in environmental conditions than to other soil chemical and physical parameters. In this study, we determined the shifts in soil microbial community structure as well as indicative taxa in soils under three moisture regimes using high-throughput Illumina sequencing and range of bioinformatics approaches for the assessment of sequence data. Incubation experiments were performed in soil-filled (Greyic Phaeozems Albic) rhizoboxes with maize and without plants. Three contrasting moisture regimes were being simulated: 1) optimal wetting (OW), a watering 2-3 times per week to maintain soil moisture of 20-25% by weight; 2) periodic wetting (PW), with alternating periods of wetting and drought; and 3) constant insufficient wetting (IW), while soil moisture of 12% by weight was permanently maintained. Sampled fresh soils were homogenized, and the total DNA of three replicates was extracted using the FastDNA® SPIN kit for Soil. DNA replicates were combined in a pooled sample and the DNA was used for PCR with specific primers for the 16S V3 and V4 regions. In order to compare variability between different samples and replicates within a single sample, some DNA replicates treated separately. The products were purified and submitted to Illumina MiSeq sequencing. Sequence data were evaluated by alpha-diversity (Chao1 and Shannon H' diversity indexes), beta-diversity (UniFrac and Bray-Curtis dissimilarity), heatmap, tagcloud, and plot-bar analyses using the MiSeq Reporter Metagenomics Workflow and R packages (phyloseq, vegan, tagcloud). Shannon index varied in a rather narrow range (4.4-4.9) with the lowest values for microbial communities under PW treatment. Chao1 index varied from 385 to 480, being a more flexible indicator than Shannon index. Chao1 had similar values for OW and IW communities, but alpha-diversity of microbial communities has sharply decreased under PW treatment. There was no visible difference in beta-diversity depending on sampling date and wetting regime, however, it could be possible to distinguish microbial communities in soils with maize and without plants. The presence of maize was acting as scattering agent, making microbial communities more distinguished. In all studied samples, the most dominant phyla were Proteobacteria, Firmicutes, Verrucomicrobia, Actinobacteria, and Acidobacteria. Chthoniobacter, Bacillus, Alicyclobacillus, Rhodoplanes, Cohnella, Kaistobacter, and Solibacter were the most abundant genera. Moreover, these genera were found as the most reactive and variable taxa in microbial community. Thus, DNA high-throughput sequencing revealed no dramatic shifts in bacterial community structure in soils under different moisture regimes. However, this technique allowed us to determine the effect of wetting regime and the presence of plants on soil microbial community which were adaptable to insufficient wetting, but lost diversity under periodic wetting. Furthermore, we detected the indicative taxa which dominate in microbial communities and at the same time strongly react to environmental changes.

  7. ChIP-seq and RNA-seq methods to study circadian control of transcription in mammals

    PubMed Central

    Takahashi, Joseph S.; Kumar, Vivek; Nakashe, Prachi; Koike, Nobuya; Huang, Hung-Chung; Green, Carla B.; Kim, Tae-Kyung

    2015-01-01

    Genome-wide analyses have revolutionized our ability to study the transcriptional regulation of circadian rhythms. The advent of next-generation sequencing methods has facilitated the use of two such technologies, ChIP-seq and RNA-seq. In this chapter, we describe detailed methods and protocols for these two techniques, with emphasis on their usage in circadian rhythm experiments in the mouse liver, a major target organ of the circadian clock system. Critical factors for these methods are highlighted and issues arising with time series samples for ChIP-seq and RNA-seq are discussed. Finally detailed protocols for library preparation suitable for Illumina sequencing platforms are presented. PMID:25662462

  8. Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

    PubMed

    Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

    2016-11-23

    The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.

  9. A Concise Atlas of Thyroid Cancer Next-Generation Sequencing Panel ThyroSeq v.2

    PubMed Central

    Alsina, Jorge; Alsina, Raul; Gulec, Seza

    2017-01-01

    The next-generation sequencing technology allows high out-put genomic analysis. An innovative assay in thyroid cancer, ThyroSeq® was developed for targeted mutation detection by next generation sequencing technology in fine needle aspiration and tissue samples. ThyroSeq v.2 next generation sequencing panel offers simultaneous sequencing and detection in >1000 hotspots of 14 thyroid cancer-related genes and for 42 types of gene fusions known to occur in thyroid cancer. ThyroSeq is being increasingly used to further narrow the indeterminate category defined by cytology for thyroid nodules. From a surgical perspective, genomic profiling also provides prognostic and predictive information and closely relates to determination of surgical strategy. Both the genomic analysis technology and the informatics for the cancer genome data base are rapidly developing. In this paper, we have gathered existing information on the thyroid cancer-related genes involved in the initiation and progression of thyroid cancer. Our goal is to assemble a glossary for the current ThyroSeq genomic panel that can help elucidate the role genomics play in thyroid cancer oncogenesis. PMID:28117295

  10. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest.

    PubMed

    Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping

    2017-12-12

    In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.

  11. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias.

    PubMed

    Johansen, Christopher T; Dubé, Joseph B; Loyzer, Melissa N; MacDonald, Austin; Carter, David E; McIntyre, Adam D; Cao, Henian; Wang, Jian; Robinson, John F; Hegele, Robert A

    2014-04-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues.

  12. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias[S

    PubMed Central

    Johansen, Christopher T.; Dubé, Joseph B.; Loyzer, Melissa N.; MacDonald, Austin; Carter, David E.; McIntyre, Adam D.; Cao, Henian; Wang, Jian; Robinson, John F.; Hegele, Robert A.

    2014-01-01

    We report the design of a targeted resequencing panel for monogenic dyslipidemias, LipidSeq, for the purpose of replacing Sanger sequencing in the clinical detection of dyslipidemia-causing variants. We also evaluate the performance of the LipidSeq approach versus Sanger sequencing in 84 patients with a range of phenotypes including extreme blood lipid concentrations as well as additional dyslipidemias and related metabolic disorders. The panel performs well, with high concordance (95.2%) in samples with known mutations based on Sanger sequencing and a high detection rate (57.9%) of mutations likely to be causative for disease in samples not previously sequenced. Clinical implementation of LipidSeq has the potential to aid in the molecular diagnosis of patients with monogenic dyslipidemias with a high degree of speed and accuracy and at lower cost than either Sanger sequencing or whole exome sequencing. Furthermore, LipidSeq will help to provide a more focused picture of monogenic and polygenic contributors that underlie dyslipidemia while excluding the discovery of incidental pathogenic clinically actionable variants in nonmetabolism-related genes, such as oncogenes, that would otherwise be identified by a whole exome approach, thus minimizing potential ethical issues. PMID:24503134

  13. Evaluation of the utility of the new rainbow trout genome assembly for analyzing RNA-seq data from stress response experiments

    USDA-ARS?s Scientific Manuscript database

    The newly released rainbow trout genome assembly in NCBI RefSeq has greatly expanded our abilities for analyzing rainbow trout sequencing data. In this poster, we evaluate the utility of this genome assembly for analyzing RNA sequencing (RNA-seq) data of rainbow trout responses to various stressors,...

  14. Biotransformation of nitrogen- and sulfur-containing pollutants during coking wastewater treatment: Correspondence of performance to microbial community functional structure.

    PubMed

    Joshi, Dev Raj; Zhang, Yu; Gao, Yinxin; Liu, Yuan; Yang, Min

    2017-09-15

    Although coking wastewater is generally considered to contain high concentration of nitrogen- and sulfur-containing pollutants, the biotransformation processes of these compounds have not been well understood. Herein, a high throughput functional gene array (GeoChip 5.0) in combination with Illumina MiSeq sequencing of the 16S rRNA gene were used to identify microbial functional traits and their role in biotransformation of nitrogen- and sulfur-containing compounds in a bench-scale aerobic coking wastewater treatment system operated for 488 days. Biotransformation of nitrogen and sulfur-containing pollutants deteriorated when pH of the bioreactor was increased to >8.0, and the microbial community functional structure was significantly associated with pH (Mantels test, P < 0.05). The release of ammonia nitrogen and sulfate was correlated with both the taxonomic and functional microbial community structure (P < 0.05). Considering the abundance and correlation with the release of ammonia nitrogen and sulfate, aromatic dioxygenases (e.g. xylXY, nagG), nitrilases (e.g. nhh, nitrilase), dibenzothiophene oxidase (DbtAc), and thiocyanate hydrolase (scnABC) were important functional genes for biotransformation of nitrogen- and sulfur-containing pollutants. Functional characterization of taxa and network analysis suggested that Burkholderiales, Actinomycetales, Rhizobiales, Pseudomonadales, and Hydrogenophiliales (Thiobacillus) were key functional taxa. Variance partitioning analysis showed that pH and influent ammonia nitrogen jointly explained 25.9% and 35.5% of variation in organic pollutant degrading genes and microbial community structure, respectively. This study revealed a linkage between microbial community functional structure and the likely biotransformation of nitrogen- and sulfur-containing pollutants, along with a suitable range of pH (7.0-7.5) for stability of the biological system treating coking wastewater. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Membrane-on-a-chip: microstructured silicon/silicon-dioxide chips for high-throughput screening of membrane transport and viral membrane fusion.

    PubMed

    Kusters, Ilja; van Oijen, Antoine M; Driessen, Arnold J M

    2014-04-22

    Screening of transport processes across biological membranes is hindered by the challenge to establish fragile supported lipid bilayers and the difficulty to determine at which side of the membrane reactants reside. Here, we present a method for the generation of suspended lipid bilayers with physiological relevant lipid compositions on microstructured Si/SiO2 chips that allow for high-throughput screening of both membrane transport and viral membrane fusion. Simultaneous observation of hundreds of single-membrane channels yields statistical information revealing population heterogeneities of the pore assembly and conductance of the bacterial toxin α-hemolysin (αHL). The influence of lipid composition and ionic strength on αHL pore formation was investigated at the single-channel level, resolving features of the pore-assembly pathway. Pore formation is inhibited by a specific antibody, demonstrating the applicability of the platform for drug screening of bacterial toxins and cell-penetrating agents. Furthermore, fusion of H3N2 influenza viruses with suspended lipid bilayers can be observed directly using a specialized chip architecture. The presented micropore arrays are compatible with fluorescence readout from below using an air objective, thus allowing high-throughput screening of membrane transport in multiwell formats in analogy to plate readers.

  16. Fishing on chips: up-and-coming technological advances in analysis of zebrafish and Xenopus embryos.

    PubMed

    Zhu, Feng; Skommer, Joanna; Huang, Yushi; Akagi, Jin; Adams, Dany; Levin, Michael; Hall, Chris J; Crosier, Philip S; Wlodkowic, Donald

    2014-11-01

    Biotests performed on small vertebrate model organisms provide significant investigative advantages as compared with bioassays that employ cell lines, isolated primary cells, or tissue samples. The main advantage offered by whole-organism approaches is that the effects under study occur in the context of intact physiological milieu, with all its intercellular and multisystem interactions. The gap between the high-throughput cell-based in vitro assays and low-throughput, disproportionally expensive and ethically controversial mammal in vivo tests can be closed by small model organisms such as zebrafish or Xenopus. The optical transparency of their tissues, the ease of genetic manipulation and straightforward husbandry, explain the growing popularity of these model organisms. Nevertheless, despite the potential for miniaturization, automation and subsequent increase in throughput of experimental setups, the manipulation, dispensing and analysis of living fish and frog embryos remain labor-intensive. Recently, a new generation of miniaturized chip-based devices have been developed for zebrafish and Xenopus embryo on-chip culture and experimentation. In this work, we review the critical developments in the field of Lab-on-a-Chip devices designed to alleviate the limits of traditional platforms for studies on zebrafish and clawed frog embryo and larvae. © 2014 International Society for Advancement of Cytometry. © 2014 International Society for Advancement of Cytometry.

  17. Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

    PubMed

    Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

    2017-10-06

    Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.

  18. Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics.

    PubMed

    Harvey, Michael G; Smith, Brian Tilston; Glenn, Travis C; Faircloth, Brant C; Brumfield, Robb T

    2016-09-01

    Sequence capture and restriction site associated DNA sequencing (RAD-Seq) are two genomic enrichment strategies for applying next-generation sequencing technologies to systematics studies. At shallow timescales, such as within species, RAD-Seq has been widely adopted among researchers, although there has been little discussion of the potential limitations and benefits of RAD-Seq and sequence capture. We discuss a series of issues that may impact the utility of sequence capture and RAD-Seq data for shallow systematics in non-model species. We review prior studies that used both methods, and investigate differences between the methods by re-analyzing existing RAD-Seq and sequence capture data sets from a Neotropical bird (Xenops minutus). We suggest that the strengths of RAD-Seq data sets for shallow systematics are the wide dispersion of markers across the genome, the relative ease and cost of laboratory work, the deep coverage and read overlap at recovered loci, and the high overall information that results. Sequence capture's benefits include flexibility and repeatability in the genomic regions targeted, success using low-quality samples, more straightforward read orthology assessment, and higher per-locus information content. The utility of a method in systematics, however, rests not only on its performance within a study, but on the comparability of data sets and inferences with those of prior work. In RAD-Seq data sets, comparability is compromised by low overlap of orthologous markers across species and the sensitivity of genetic diversity in a data set to an interaction between the level of natural heterozygosity in the samples examined and the parameters used for orthology assessment. In contrast, sequence capture of conserved genomic regions permits interrogation of the same loci across divergent species, which is preferable for maintaining comparability among data sets and studies for the purpose of drawing general conclusions about the impact of historical processes across biotas. We argue that sequence capture should be given greater attention as a method of obtaining data for studies in shallow systematics and comparative phylogeography. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. On-chip Magnetic Separation and Cell Encapsulation in Droplets

    NASA Astrophysics Data System (ADS)

    Chen, A.; Byvank, T.; Bharde, A.; Miller, B. L.; Chalmers, J. J.; Sooryakumar, R.; Chang, W.-J.; Bashir, R.

    2012-02-01

    The demand for high-throughput single cell assays is gaining importance because of the heterogeneity of many cell suspensions, even after significant initial sorting. These suspensions may display cell-to-cell variability at the gene expression level that could impact single cell functional genomics, cancer, stem-cell research and drug screening. The on-chip monitoring of individual cells in an isolated environment could prevent cross-contamination, provide high recovery yield and ability to study biological traits at a single cell level These advantages of on-chip biological experiments contrast to conventional methods, which require bulk samples that provide only averaged information on cell metabolism. We report on a device that integrates microfluidic technology with a magnetic tweezers array to combine the functionality of separation and encapsulation of objects such as immunomagnetically labeled cells or magnetic beads into pico-liter droplets on the same chip. The ability to control the separation throughput that is independent of the hydrodynamic droplet generation rate allows the encapsulation efficiency to be optimized. The device can potentially be integrated with on-chip labeling and/or bio-detection to become a powerful single-cell analysis device.

  20. Identification of differentially expressed genes in cucumber (Cucumis sativus L.) root under waterlogging stress by digital gene expression profile.

    PubMed

    Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao

    2012-03-01

    High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright © 2011 Elsevier Inc. All rights reserved.

Top