Sample records for rna-seq transcriptome analysis

  1. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis.

    PubMed

    Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich

    2015-12-16

    Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.

  2. RNA-Seq Technology and Its Application in Fish Transcriptomics

    PubMed Central

    Ba, Yi; Zhuang, Qianfeng

    2014-01-01

    Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445

  3. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome

    USDA-ARS?s Scientific Manuscript database

    A first analysis of the Glycine max (L.) Merr. (soybean) transcriptome using next generation sequencing technology and RNA-Sequencing (RNA-Seq) is presented. This analysis will provide an important resource for understanding transcription and gene co-regulatory networks in soybean, the most economic...

  4. CLIP-seq analysis of multi-mapped reads discovers novel functional RNA regulatory sites in the human transcriptome.

    PubMed

    Zhang, Zijun; Xing, Yi

    2017-09-19

    Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Additional annotation of the pig transcriptome using integrated Iso-seq and Illumina RNA-seq analysis

    USDA-ARS?s Scientific Manuscript database

    Alternative splicing is a well-known phenomenon that dramatically increases eukaryotic transcriptome diversity. The extent of mRNA isoform diversity among porcine tissues was assessed using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) and Illumina short read sequencing ...

  6. Optimized approach for Ion Proton RNA sequencing reveals details of RNA splicing and editing features of the transcriptome.

    PubMed

    Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A

    2017-01-01

    RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.

  7. ChloroSeq, an optimized chloroplast RNA-Seq bioinformatic pipeline, reveals remodeling of the organellar transcriptome under heat stress

    DOE PAGES

    Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.; ...

    2016-07-06

    Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less

  8. ChloroSeq, an optimized chloroplast RNA-Seq bioinformatic pipeline, reveals remodeling of the organellar transcriptome under heat stress

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.

    Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less

  9. Single-cell Transcriptome Study as Big Data

    PubMed Central

    Yu, Pingjian; Lin, Wei

    2016-01-01

    The rapid growth of single-cell RNA-seq studies (scRNA-seq) demands efficient data storage, processing, and analysis. Big-data technology provides a framework that facilitates the comprehensive discovery of biological signals from inter-institutional scRNA-seq datasets. The strategies to solve the stochastic and heterogeneous single-cell transcriptome signal are discussed in this article. After extensively reviewing the available big-data applications of next-generation sequencing (NGS)-based studies, we propose a workflow that accounts for the unique characteristics of scRNA-seq data and primary objectives of single-cell studies. PMID:26876720

  10. Single-nucleus RNA-seq of differentiating human myoblasts reveals the extent of fate heterogeneity

    PubMed Central

    Zeng, Weihua; Jiang, Shan; Kong, Xiangduo; El-Ali, Nicole; Ball, Alexander R.; Ma, Christopher I-Hsing; Hashimoto, Naohiro; Yokomori, Kyoko; Mortazavi, Ali

    2016-01-01

    Myoblasts are precursor skeletal muscle cells that differentiate into fused, multinucleated myotubes. Current single-cell microfluidic methods are not optimized for capturing very large, multinucleated cells such as myotubes. To circumvent the problem, we performed single-nucleus transcriptome analysis. Using immortalized human myoblasts, we performed RNA-seq analysis of single cells (scRNA-seq) and single nuclei (snRNA-seq) and found them comparable, with a distinct enrichment for long non-coding RNAs (lncRNAs) in snRNA-seq. We then compared snRNA-seq of myoblasts before and after differentiation. We observed the presence of mononucleated cells (MNCs) that remained unfused and analyzed separately from multi-nucleated myotubes. We found that while the transcriptome profiles of myoblast and myotube nuclei are relatively homogeneous, MNC nuclei exhibited significant heterogeneity, with the majority of them adopting a distinct mesenchymal state. Primary transcripts for microRNAs (miRNAs) that participate in skeletal muscle differentiation were among the most differentially expressed lncRNAs, which we validated using NanoString. Our study demonstrates that snRNA-seq provides reliable transcriptome quantification for cells that are otherwise not amenable to current single-cell platforms. Our results further indicate that snRNA-seq has unique advantage in capturing nucleus-enriched lncRNAs and miRNA precursors that are useful in mapping and monitoring differential miRNA expression during cellular differentiation. PMID:27566152

  11. Transcriptome landscape of Synechococcus elongatus PCC 7942 for nitrogen starvation responses using RNA-seq

    PubMed Central

    Choi, Sun Young; Park, Byeonghyeok; Choi, In-Geol; Sim, Sang Jun; Lee, Sun-Mi; Um, Youngsoon; Woo, Han Min

    2016-01-01

    The development of high-throughput technology using RNA-seq has allowed understanding of cellular mechanisms and regulations of bacterial transcription. In addition, transcriptome analysis with RNA-seq has been used to accelerate strain improvement through systems metabolic engineering. Synechococcus elongatus PCC 7942, a photosynthetic bacterium, has remarkable potential for biochemical and biofuel production due to photoautotrophic cell growth and direct CO2 conversion. Here, we performed a transcriptome analysis of S. elongatus PCC 7942 using RNA-seq to understand the changes of cellular metabolism and regulation for nitrogen starvation responses. As a result, differentially expressed genes (DEGs) were identified and functionally categorized. With mapping onto metabolic pathways, we probed transcriptional perturbation and regulation of carbon and nitrogen metabolisms relating to nitrogen starvation responses. Experimental evidence such as chlorophyll a and phycobilisome content and the measurement of CO2 uptake rate validated the transcriptome analysis. The analysis suggests that S. elongatus PCC 7942 reacts to nitrogen starvation by not only rearranging the cellular transport capacity involved in carbon and nitrogen assimilation pathways but also by reducing protein synthesis and photosynthesis activities. PMID:27488818

  12. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

    PubMed

    Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

    2016-02-04

    Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

  13. Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

    PubMed

    Davidson, Nadia M; Oshlack, Alicia

    2018-05-01

    RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.

  14. Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons

    PubMed Central

    Krishnaswami, Suguna Rani; Grindberg, Rashel V; Novotny, Mark; Venepally, Pratap; Lacar, Benjamin; Bhutani, Kunal; Linker, Sara B; Pham, Son; Erwin, Jennifer A; Miller, Jeremy A; Hodge, Rebecca; McCarthy, James K; Kelder, Martin; McCorrison, Jamison; Aevermann, Brian D; Fuertes, Francisco Diez; Scheuermann, Richard H; Lee, Jun; Lein, Ed S; Schork, Nicholas; McConnell, Michael J; Gage, Fred H; Lasken, Roger S

    2016-01-01

    A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at −80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing. PMID:26890679

  15. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    PubMed

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  16. Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome

    PubMed Central

    Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.

    2011-01-01

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880

  17. Transcriptome Analysis Based on RNA-Seq in Understanding Pathogenic Mechanisms of Diseases and the Immune System of Fish: A Comprehensive Review

    PubMed Central

    Sudhagar, Arun; El-Matbouli, Mansour

    2018-01-01

    In recent years, with the advent of next-generation sequencing along with the development of various bioinformatics tools, RNA sequencing (RNA-Seq)-based transcriptome analysis has become much more affordable in the field of biological research. This technique has even opened up avenues to explore the transcriptome of non-model organisms for which a reference genome is not available. This has made fish health researchers march towards this technology to understand pathogenic processes and immune reactions in fish during the event of infection. Recent studies using this technology have altered and updated the previous understanding of many diseases in fish. RNA-Seq has been employed in the understanding of fish pathogens like bacteria, virus, parasites, and oomycetes. Also, it has been helpful in unraveling the immune mechanisms in fish. Additionally, RNA-Seq technology has made its way for future works, such as genetic linkage mapping, quantitative trait analysis, disease-resistant strain or broodstock selection, and the development of effective vaccines and therapies. Until now, there are no reviews that comprehensively summarize the studies which made use of RNA-Seq to explore the mechanisms of infection of pathogens and the defense strategies of fish hosts. This review aims to summarize the contemporary understanding and findings with regard to infectious pathogens and the immune system of fish that have been achieved through RNA-Seq technology. PMID:29342931

  18. Transcriptome Analysis at the Single-Cell Level Using SMART Technology.

    PubMed

    Fish, Rachel N; Bostick, Magnolia; Lehman, Alisa; Farmer, Andrew

    2016-10-10

    RNA sequencing (RNA-seq) is a powerful method for analyzing cell state, with minimal bias, and has broad applications within the biological sciences. However, transcriptome analysis of seemingly homogenous cell populations may in fact overlook significant heterogeneity that can be uncovered at the single-cell level. The ultra-low amount of RNA contained in a single cell requires extraordinarily sensitive and reproducible transcriptome analysis methods. As next-generation sequencing (NGS) technologies mature, transcriptome profiling by RNA-seq is increasingly being used to decipher the molecular signature of individual cells. This unit describes an ultra-sensitive and reproducible protocol to generate cDNA and sequencing libraries directly from single cells or RNA inputs ranging from 10 pg to 10 ng. Important considerations for working with minute RNA inputs are given. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  19. Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.

    PubMed

    Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai

    2015-10-16

    To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.

  20. Customized workflow development and data modularization concepts for RNA-Sequencing and metatranscriptome experiments.

    PubMed

    Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R

    2017-11-10

    RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  1. RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

    PubMed Central

    Zhang, Zhaojun; Wang, Wei

    2014-01-01

    Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: weiwang@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931995

  2. Comparative evaluation of rRNA depletion procedures for the improved analysis of bacterial biofilm and mixed pathogen culture transcriptomes

    PubMed Central

    Petrova, Olga E.; Garcia-Alcalde, Fernando; Zampaloni, Claudia; Sauer, Karin

    2017-01-01

    Global transcriptomic analysis via RNA-seq is often hampered by the high abundance of ribosomal (r)RNA in bacterial cells. To remove rRNA and enrich coding sequences, subtractive hybridization procedures have become the approach of choice prior to RNA-seq, with their efficiency varying in a manner dependent on sample type and composition. Yet, despite an increasing number of RNA-seq studies, comparative evaluation of bacterial rRNA depletion methods has remained limited. Moreover, no such study has utilized RNA derived from bacterial biofilms, which have potentially higher rRNA:mRNA ratios and higher rRNA carryover during RNA-seq analysis. Presently, we evaluated the efficiency of three subtractive hybridization-based kits in depleting rRNA from samples derived from biofilm, as well as planktonic cells of the opportunistic human pathogen Pseudomonas aeruginosa. Our results indicated different rRNA removal efficiency for the three procedures, with the Ribo-Zero kit yielding the highest degree of rRNA depletion, which translated into enhanced enrichment of non-rRNA transcripts and increased depth of RNA-seq coverage. The results indicated that, in addition to improving RNA-seq sensitivity, efficient rRNA removal enhanced detection of low abundance transcripts via qPCR. Finally, we demonstrate that the Ribo-Zero kit also exhibited the highest efficiency when P. aeruginosa/Staphylococcus aureus co-culture RNA samples were tested. PMID:28117413

  3. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

    PubMed Central

    Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv

    2013-01-01

    De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962

  4. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    PubMed

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  5. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  6. High-throughput full-length single-cell mRNA-seq of rare cells.

    PubMed

    Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X

    2017-01-01

    Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.

  7. Comparison of ribosomal RNA removal methods for transcriptome sequencing workflows in teleost fish

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is becoming the standard for transcriptome analysis. Removal of contaminating ribosomal RNA (rRNA) is a priority in the preparation of libraries suitable for sequencing. rRNAs are commonly removed from total RNA via either mRNA selection or rRNA depletion. These methods have...

  8. Lessons from single-cell transcriptome analysis of oxygen-sensing cells.

    PubMed

    Zhou, Ting; Matsunami, Hiroaki

    2018-05-01

    The advent of single-cell RNA-sequencing (RNA-Seq) technology has enabled transcriptome profiling of individual cells. Comprehensive gene expression analysis at the single-cell level has proven to be effective in characterizing the most fundamental aspects of cellular function and identity. This unbiased approach is revolutionary for small and/or heterogeneous tissues like oxygen-sensing cells in identifying key molecules. Here, we review the major methods of current single-cell RNA-Seq technology. We discuss how this technology has advanced the understanding of oxygen-sensing glomus cells in the carotid body and helped uncover novel oxygen-sensing cells and mechanisms in the mice olfactory system. We conclude by providing our perspective on future single-cell RNA-Seq research directed at oxygen-sensing cells.

  9. Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling of anterior cruciate ligament tears.

    PubMed

    Rai, Muhammad Farooq; Tycksen, Eric D; Sandell, Linda J; Brophy, Robert H

    2018-01-01

    Microarrays and RNA-seq are at the forefront of high throughput transcriptome analyses. Since these methodologies are based on different principles, there are concerns about the concordance of data between the two techniques. The concordance of RNA-seq and microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed in clinically derived ligament tissues. To demonstrate the concordance between RNA-seq and microarrays and to assess potential benefits of RNA-seq over microarrays, we assessed differences in transcript expression in anterior cruciate ligament (ACL) tissues based on time-from-injury. ACL remnants were collected from patients with an ACL tear at the time of ACL reconstruction. RNA prepared from torn ACL remnants was subjected to Agilent microarrays (N = 24) and RNA-seq (N = 8). The correlation of biological replicates in RNA-seq and microarrays data was similar (0.98 vs. 0.97), demonstrating that each platform has high internal reproducibility. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarrays values were moderate. The cross-platform concordance for differentially expressed transcripts or enriched pathways was linearly correlated (r = 0.64). RNA-Seq was superior in detecting low abundance transcripts and differentiating biologically critical isoforms. Additional independent validation of transcript expression was undertaken using microfluidic PCR for selected genes. PCR data showed 100% concordance (in expression pattern) with RNA-seq and microarrays data. These findings demonstrate that RNA-seq has advantages over microarrays for transcriptome profiling of ligament tissues when available and affordable. Furthermore, these findings are likely transferable to other musculoskeletal tissues where tissue collection is challenging and cells are in low abundance. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:484-497, 2018. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.

  10. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

    PubMed Central

    Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior

    2012-01-01

    Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036

  11. Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq

    PubMed Central

    Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim

    2014-01-01

    The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.

  12. Whole transcriptome RNA-Seq analysis of breast cancer recurrence risk using formalin-fixed paraffin-embedded tumor tissue.

    PubMed

    Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J; Pho, Mylan; Dei Rossi, Andrew; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre

    2012-01-01

    RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts.

  13. Whole Transcriptome RNA-Seq Analysis of Breast Cancer Recurrence Risk Using Formalin-Fixed Paraffin-Embedded Tumor Tissue

    PubMed Central

    Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J.; Pho, Mylan; Rossi, Andrew Dei; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre

    2012-01-01

    RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts. PMID:22808097

  14. The bench scientist's guide to RNA-Seq analysis

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatic specialists. Here we outline a methods strategy desi...

  15. High-confidence coding and noncoding transcriptome maps

    PubMed Central

    2017-01-01

    The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519

  16. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  17. expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform1[OPEN

    PubMed Central

    2016-01-01

    The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP’s suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. PMID:26869702

  18. iSeq: Web-Based RNA-seq Data Analysis and Visualization.

    PubMed

    Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng

    2018-01-01

    Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .

  19. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  20. Microfluidic single-cell whole-transcriptome sequencing.

    PubMed

    Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi

    2014-05-13

    Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.

  1. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  2. Tn5Prime, a Tn5 based 5' capture method for single cell RNA-seq.

    PubMed

    Cole, Charles; Byrne, Ashley; Beaudin, Anna E; Forsberg, E Camilla; Vollmers, Christopher

    2018-06-01

    RNA-sequencing (RNA-seq) is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase-based Smart-seq2 protocol to create RNA-seq libraries that capture the 5' end of transcripts. The Tn5Prime method dramatically streamlines the 5' capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples, we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3' end-based high-throughput methods like Drop-seq and 10× Genomics Chromium, the 5' capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3' end-based methods, Tn5Prime also enables the assembly of the variable 5' ends of the antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.

  3. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.

    PubMed

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.

  4. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells

    PubMed Central

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. PMID:29208629

  5. High-throughput illumina strand-specific RNA sequencing library preparation

    USDA-ARS?s Scientific Manuscript database

    Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...

  6. Experimental Design and Power Calculation for RNA-seq Experiments.

    PubMed

    Wu, Zhijin; Wu, Hao

    2016-01-01

    Power calculation is a critical component of RNA-seq experimental design. The flexibility of RNA-seq experiment and the wide dynamic range of transcription it measures make it an attractive technology for whole transcriptome analysis. These features, in addition to the high dimensionality of RNA-seq data, bring complexity in experimental design, making an analytical power calculation no longer realistic. In this chapter we review the major factors that influence the statistical power of detecting differential expression, and give examples of power assessment using the R package PROPER.

  7. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

    PubMed

    Haas, Brian J; Papanicolaou, Alexie; Yassour, Moran; Grabherr, Manfred; Blood, Philip D; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N; Henschel, Robert; LeDuc, Richard D; Friedman, Nir; Regev, Aviv

    2013-08-01

    De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

  8. Optimization Of A High-Throughput Transcriptomic (HTTr) Bioactivity Screen In MCF7 Cells Using Targeted RNA-Seq (SOT)

    EPA Science Inventory

    Recent advances in targeted RNA-Seq technology allow researchers to efficiently and cost-effectively obtain whole transcriptome profiles using picograms of mRNA from human cell lysates. Low mRNA input requirements and sample multiplexing capabilities has made time- and concentrat...

  9. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  10. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. rnaQUAST: a quality assessment tool for de novo transcriptome assemblies.

    PubMed

    Bushmanova, Elena; Antipov, Dmitry; Lapidus, Alla; Suvorov, Vladimir; Prjibelski, Andrey D

    2016-07-15

    Ability to generate large RNA-Seq datasets created a demand for both de novo and reference-based transcriptome assemblers. However, while many transcriptome assemblers are now available, there is still no unified quality assessment tool for RNA-Seq assemblies. We present rnaQUAST-a tool for evaluating RNA-Seq assembly quality and benchmarking transcriptome assemblers using reference genome and gene database. rnaQUAST calculates various metrics that demonstrate completeness and correctness levels of the assembled transcripts, and outputs them in a user-friendly report. rnaQUAST is implemented in Python and is freely available at http://bioinf.spbau.ru/en/rnaquast ap@bioinf.spbau.ru Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. RNA-Seq Transcriptome Profiling of Upland Cotton (Gossypium hirsutum L.) Root Tissue under Water-Deficit Stress

    PubMed Central

    Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd

    2013-01-01

    An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815

  13. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

    PubMed Central

    Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol; Scalcinati, Gionata; Fagerberg, Linn; Uhlén, Matthias; Nielsen, Jens

    2012-01-01

    RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. PMID:22965124

  14. Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.

    PubMed

    Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry

    2016-11-01

    Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.

  15. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.

  16. From root to fruit: RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis may affect tomato fruit metabolism.

    PubMed

    Zouari, Inès; Salvioli, Alessandra; Chialva, Matteo; Novero, Mara; Miozzi, Laura; Tenore, Gian Carlo; Bagnaresi, Paolo; Bonfante, Paola

    2014-03-21

    Tomato (Solanum lycopersicum) establishes a beneficial symbiosis with arbuscular mycorrhizal (AM) fungi. The formation of the mycorrhizal association in the roots leads to plant-wide modulation of gene expression. To understand the systemic effect of the fungal symbiosis on the tomato fruit, we used RNA-Seq to perform global transcriptome profiling on Moneymaker tomato fruits at the turning ripening stage. Fruits were collected at 55 days after flowering, from plants colonized with Funneliformis mosseae and from control plants, which were fertilized to avoid responses related to nutrient deficiency. Transcriptome analysis identified 712 genes that are differentially expressed in fruits from mycorrhizal and control plants. Gene Ontology (GO) enrichment analysis of these genes showed 81 overrepresented functional GO classes. Up-regulated GO classes include photosynthesis, stress response, transport, amino acid synthesis and carbohydrate metabolism functions, suggesting a general impact of fungal symbiosis on primary metabolisms and, particularly, on mineral nutrition. Down-regulated GO classes include cell wall, metabolism and ethylene response pathways. Quantitative RT-PCR validated the RNA-Seq results for 12 genes out of 14 when tested at three fruit ripening stages, mature green, breaker and turning. Quantification of fruit nutraceutical and mineral contents produced values consistent with the expression changes observed by RNA-Seq analysis. This RNA-Seq profiling produced a novel data set that explores the intersection of mycorrhization and fruit development. We found that the fruits of mycorrhizal plants show two transcriptomic "signatures": genes characteristic of a climacteric fleshy fruit, and genes characteristic of mycorrhizal status, like phosphate and sulphate transporters. Moreover, mycorrhizal plants under low nutrient conditions produce fruits with a nutrient content similar to those from non-mycorrhizal plants under high nutrient conditions, indicating that AM fungi can help replace exogenous fertilizer for fruit crops.

  17. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology

    PubMed Central

    Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.

    2016-01-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274

  18. Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.

    PubMed

    Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan

    2017-01-01

    Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.

  19. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

    PubMed Central

    Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

    2016-01-01

    Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591

  20. Transcriptome analysis of Brassica napus pod using RNA-Seq and identification of lipid-related candidate genes.

    PubMed

    Xu, Hai-Ming; Kong, Xiang-Dong; Chen, Fei; Huang, Ji-Xiang; Lou, Xiang-Yang; Zhao, Jian-Yi

    2015-10-24

    Brassica napus is an important oilseed crop. Dissection of the genetic architecture underlying oil-related biological processes will greatly facilitates the genetic improvement of rapeseed. The differential gene expression during pod development offers a snapshot on the genes responsible for oil accumulation in. To identify candidate genes in the linkage peaks reported previously, we used RNA sequencing (RNA-Seq) technology to analyze the pod transcriptomes of German cultivar Sollux and Chinese inbred line Gaoyou. The RNA samples were collected for RNA-Seq at 5-7, 15-17 and 25-27 days after flowering (DAF). Bioinformatics analysis was performed to investigate differentially expressed genes (DEGs). Gene annotation analysis was integrated with QTL mapping and Brassica napus pod transcriptome profiling to detect potential candidate genes in oilseed. Four hundred sixty five and two thousand, one hundred fourteen candidate DEGs were identified, respectively, between two varieties at the same stages and across different periods of each variety. Then, 33 DEGs between Sollux and Gaoyou were identified as the candidate genes affecting seed oil content by combining those DEGs with the quantitative trait locus (QTL) mapping results, of which, one was found to be homologous to Arabidopsis thaliana lipid-related genes. Intervarietal DEGs of lipid pathways in QTL regions represent important candidate genes for oil-related traits. Integrated analysis of transcriptome profiling, QTL mapping and comparative genomics with other relative species leads to efficient identification of most plausible functional genes underlying oil-content related characters, offering valuable resources for bettering breeding program of Brassica napus. This study provided a comprehensive overview on the pod transcriptomes of two varieties with different oil-contents at the three developmental stages.

  1. Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud.

    PubMed

    Yang, Andrian; Troup, Michael; Lin, Peijie; Ho, Joshua W K

    2017-03-01

    Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. j.ho@victorchang.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  2. SC3 - consensus clustering of single-cell RNA-Seq data

    PubMed Central

    Kiselev, Vladimir Yu.; Kirschner, Kristina; Schaub, Michael T.; Andrews, Tallulah; Yiu, Andrew; Chandra, Tamir; Natarajan, Kedar N; Reik, Wolf; Barahona, Mauricio; Green, Anthony R; Hemberg, Martin

    2017-01-01

    Single-cell RNA-seq (scRNA-seq) enables a quantitative cell-type characterisation based on global transcriptome profiles. We present Single-Cell Consensus Clustering (SC3), a user-friendly tool for unsupervised clustering which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach. We demonstrate that SC3 is capable of identifying subclones based on the transcriptomes from neoplastic cells collected from patients. PMID:28346451

  3. Directional RNA-seq reveals highly complex condition-dependent transcriptomes in E. coli K12 through accurate full-length transcripts assembling.

    PubMed

    Li, Shan; Dong, Xia; Su, Zhengchang

    2013-07-30

    Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.

  4. Directional RNA-seq reveals highly complex condition-dependent transcriptomes in E. coli K12 through accurate full-length transcripts assembling

    PubMed Central

    2013-01-01

    Background Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. Results To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. Conclusions As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads. PMID:23899370

  5. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies

    PubMed Central

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948

  6. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

    PubMed

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

  7. A RNA-Seq Analysis of the Rat Supraoptic Nucleus Transcriptome: Effects of Salt Loading on Gene Expression

    PubMed Central

    Salinas, Yasmmyn D.; Shi, YiJun; Greenwood, Michael; Hoe, See Ziau; Murphy, David; Gainer, Harold

    2015-01-01

    Magnocellular neurons (MCNs) in the hypothalamo-neurohypophysial system (HNS) are highly specialized to release large amounts of arginine vasopressin (Avp) or oxytocin (Oxt) into the blood stream and play critical roles in the regulation of body fluid homeostasis. The MCNs are osmosensory neurons and are excited by exposure to hypertonic solutions and inhibited by hypotonic solutions. The MCNs respond to systemic hypertonic and hypotonic stimulation with large changes in the expression of their Avp and Oxt genes, and microarray studies have shown that these osmotic perturbations also cause large changes in global gene expression in the HNS. In this paper, we examine gene expression in the rat supraoptic nucleus (SON) under normosmotic and chronic salt-loading SL) conditions by the first time using “new-generation”, RNA sequencing (RNA-Seq) methods. We reliably detect 9,709 genes as present in the SON by RNA-Seq, and 552 of these genes were changed in expression as a result of chronic SL. These genes reflect diverse functions, and 42 of these are involved in either transcriptional or translational processes. In addition, we compare the SON transcriptomes resolved by RNA-Seq methods with the SON transcriptomes determined by Affymetrix microarray methods in rats under the same osmotic conditions, and find that there are 6,466 genes present in the SON that are represented in both data sets, although 1,040 of the expressed genes were found only in the microarray data, and 2,762 of the expressed genes are selectively found in the RNA-Seq data and not the microarray data. These data provide the research community a comprehensive view of the transcriptome in the SON under normosmotic conditions and the changes in specific gene expression evoked by salt loading. PMID:25897513

  8. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    PubMed Central

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818

  9. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    PubMed

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  10. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome

    PubMed Central

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-01-01

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755

  11. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-07-05

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.

  12. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data.

    PubMed

    Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma

    2017-02-22

    In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .

  13. RNA-seq based transcriptomic map reveals new insights into mouse salivary gland development and maturation.

    PubMed

    Gluck, Christian; Min, Sangwon; Oyelakin, Akinsola; Smalley, Kirsten; Sinha, Satrajit; Romano, Rose-Anne

    2016-11-16

    Mouse models have served a valuable role in deciphering various facets of Salivary Gland (SG) biology, from normal developmental programs to diseased states. To facilitate such studies, gene expression profiling maps have been generated for various stages of SG organogenesis. However these prior studies fall short of capturing the transcriptional complexity due to the limited scope of gene-centric microarray-based technology. Compared to microarray, RNA-sequencing (RNA-seq) offers unbiased detection of novel transcripts, broader dynamic range and high specificity and sensitivity for detection of genes, transcripts, and differential gene expression. Although RNA-seq data, particularly under the auspices of the ENCODE project, have covered a large number of biological specimens, studies on the SG have been lacking. To better appreciate the wide spectrum of gene expression profiles, we isolated RNA from mouse submandibular salivary glands at different embryonic and adult stages. In parallel, we processed RNA-seq data for 24 organs and tissues obtained from the mouse ENCODE consortium and calculated the average gene expression values. To identify molecular players and pathways likely to be relevant for SG biology, we performed functional gene enrichment analysis, network construction and hierarchal clustering of the RNA-seq datasets obtained from different stages of SG development and maturation, and other mouse organs and tissues. Our bioinformatics-based data analysis not only reaffirmed known modulators of SG morphogenesis but revealed novel transcription factors and signaling pathways unique to mouse SG biology and function. Finally we demonstrated that the unique SG gene signature obtained from our mouse studies is also well conserved and can demarcate features of the human SG transcriptome that is different from other tissues. Our RNA-seq based Atlas has revealed a high-resolution cartographic view of the dynamic transcriptomic landscape of the mouse SG at various stages. These RNA-seq datasets will complement pre-existing microarray based datasets, including the Salivary Gland Molecular Anatomy Project by offering a broader systems-biology based perspective rather than the classical gene-centric view. Ultimately such resources will be valuable in providing a useful toolkit to better understand how the diverse cell population of the SG are organized and controlled during development and differentiation.

  14. Whole-transcriptome, high-throughput RNA sequence analysis of the bovine macrophage response to Mycobacterium bovis infection in vitro.

    PubMed

    Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E

    2013-04-08

    Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.

  15. High-throughput detection of RNA processing in bacteria.

    PubMed

    Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L

    2018-03-27

    Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .

  16. SSP: an interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads.

    PubMed

    Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz

    2013-01-01

    Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.

  17. A Single-Cell Approach to the Elusive Latent Human Cytomegalovirus Transcriptome.

    PubMed

    Goodrum, Felicia; McWeeney, Shannon

    2018-06-12

    Herpesvirus latency has been difficult to understand molecularly due to low levels of viral genomes and gene expression. In the case of the betaherpesvirus human cytomegalovirus (HCMV), this is further complicated by the heterogeneity inherent to hematopoietic subpopulations harboring genomes and, as a consequence, the various patterns of infection that simultaneously exist in a host, ranging from latent to lytic. Single-cell RNA sequencing (scRNA-seq) provides tremendous potential in measuring the gene expression profiles of heterogeneous cell populations for a wide range of applications, including in studies of cancer, immunology, and infectious disease. A recent study by Shnayder et al. (mBio 9:e00013-18, 2018, https://doi.org/10.1128/mBio.00013-18) utilized scRNA-seq to define transcriptomal characteristics of HCMV latency. They conclude that latency-associated gene expression is similar to the late lytic viral program but at lower levels of expression. The study highlights the numerous challenges, from the definition of latency to the analysis of scRNA-seq, that exist in defining a latent transcriptome. Copyright © 2018 Goodrum and McWeeney.

  18. A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.

    PubMed

    Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex

    2018-04-26

    The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous RNA; Mt rRNA: mitochondrial ribosomal RNA; Mt tRNA: mitochondrial transference RNA; OAZ3: ornithine decarboxylase antizyme 3; ORT: osmotic resistance test; piRNA: Piwi-interacting RNA; PRM1: protamine 1; PTPRC: protein tyrosine phosphatase receptor type C; rRNA: ribosomal RNA; snoRNA: small nucleolar RNA; snRNA: small nuclear RNA; SRR: sperm recovery rate; tRNA: transfer RNA.

  19. Introduction to Single-Cell RNA Sequencing.

    PubMed

    Olsen, Thale Kristin; Baryawno, Ninib

    2018-04-01

    During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  20. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478

  1. Removing the needle from the haystack: Enrichment of Wolbachia endosymbiont transcripts from host nematode RNA by Cappable-seq™.

    PubMed

    Luck, Ashley N; Slatko, Barton E; Foster, Jeremy M

    2017-01-01

    Efficient transcriptomic sequencing of microbial mRNA derived from host-microbe associations is often compromised by the much lower relative abundance of microbial RNA in the mixed total RNA sample. One solution to this problem is to perform extensive sequencing until an acceptable level of transcriptome coverage is obtained. More cost-effective methods include use of prokaryotic and/or eukaryotic rRNA depletion strategies, sometimes in conjunction with depletion of polyadenylated eukaryotic mRNA. Here, we report use of Cappable-seq™ to specifically enrich, in a single step, Wolbachia endobacterial mRNA transcripts from total RNA prepared from the parasitic filarial nematode, Brugia malayi. The obligate Wolbachia endosymbiont is a proven drug target for many human filarial infections, yet the precise nature of its symbiosis with the nematode host is poorly understood. Insightful analysis of the expression levels of Wolbachia genes predicted to underpin the mutualistic association and of known drug target genes at different life cycle stages or in response to drug treatments is typically challenged by low transcriptomic coverage. Cappable-seq resulted in up to ~ 5-fold increase in the number of reads mapping to Wolbachia. On average, coverage of Wolbachia transcripts from B. malayi microfilariae was enriched ~40-fold by Cappable-seq. Additionally, this method has an additional benefit of selectively removing abundant prokaryotic ribosomal RNAs.The deeper microbial transcriptome sequencing afforded by Cappable-seq facilitates more detailed characterization of gene expression levels of pathogens and symbionts present in animal tissues.

  2. APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data.

    PubMed

    Ye, Congting; Long, Yuqi; Ji, Guoli; Li, Qingshun Quinn; Wu, Xiaohui

    2018-06-01

    Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Freely available for download at https://apatrap.sourceforge.io. liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary data are available at Bioinformatics online.

  3. The response of Isidorella newcombi to copper exposure: Using an integrated biological framework to interpret transcriptomic responses from RNA-seq analysis.

    PubMed

    Ubrihien, Rodney P; Ezaz, Tariq; Taylor, Anne M; Stevens, Mark M; Krikowa, Frank; Foster, Simon; Maher, William A

    2017-04-01

    This study describes the transcriptomic response of the Australian endemic freshwater gastropod Isidorella newcombi exposed to 80±1μg/L of copper for 3days. Analysis of copper tissue concentration, lysosomal membrane destabilisation and RNA-seq were conducted. Copper tissue concentrations confirmed that copper was bioaccumulated by the snails. Increased lysosomal membrane destabilisation in the copper-exposed snails indicated that the snails were stressed as a result of the exposure. Both copper tissue concentrations and lysosomal destabilisation were significantly greater in snails exposed to copper. In order to interpret the RNA-seq data from an ecotoxicological perspective an integrated biological response model was developed that grouped transcriptomic responses into those associated with copper transport and storage, survival mechanisms and cell death. A conceptual model of expected transcriptomic changes resulting from the copper exposure was developed as a basis to assess transcriptomic responses. Transcriptomic changes were evident at all the three levels of the integrated biological response model. Despite lacking statistical significance, increased expression of the gene encoding copper transporting ATPase provided an indication of increased internal transport of copper. Increased expression of genes associated with endocytosis are associated with increased transport of copper to the lysosome for storage in a detoxified form. Survival mechanisms included metabolic depression and processes associated with cellular repair and recycling. There was transcriptomic evidence of increased cell death by apoptosis in the copper-exposed organisms. Increased apoptosis is supported by the increase in lysosomal membrane destabilisation in the copper-exposed snails. Transcriptomic changes relating to apoptosis, phagocytosis, protein degradation and the lysosome were evident and these processes can be linked to the degradation of post-apoptotic debris. The study identified contaminant specific transcriptomic markers as well as markers of general stress. From an ecotoxicological perspective, the use of a framework to group transcriptomic responses into those associated with copper transport, survival and cell death assisted with the complex process of interpretation of RNA-seq data. The broad adoption of such a framework in ecotoxicology studies would assist in comparison between studies and the identification of reliable transcriptomic markers of contaminant exposure and response. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Isoform Evolution in Primates through Independent Combination of Alternative RNA Processing Events

    PubMed Central

    Zhang, Shi-Jian; Wang, Chenqu; Yan, Shouyu; Fu, Aisi; Luan, Xuke; Li, Yumei; Sunny Shen, Qing; Zhong, Xiaoming; Chen, Jia-Yu; Wang, Xiangfeng; Chin-Ming Tan, Bertrand; He, Aibin; Li, Chuan-Yun

    2017-01-01

    Abstract Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875 bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates. PMID:28957512

  5. Towards the integration, annotation and association of historical microarray experiments with RNA-seq.

    PubMed

    Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J

    2013-01-01

    Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.

  6. YeATS- a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut

    USDA-ARS?s Scientific Manuscript database

    The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves exist...

  7. A Comparison of RNA-Seq Results from Paired Formalin-Fixed Paraffin-Embedded and Fresh-Frozen Glioblastoma Tissue Samples

    PubMed Central

    Esteve-Codina, Anna; Arpi, Oriol; Martinez-García, Maria; Pineda, Estela; Mallo, Mar; Gut, Marta; Carrato, Cristina; Rovira, Anna; Lopez, Raquel; Tortosa, Avelina; Dabad, Marc; Del Barco, Sonia; Heath, Simon; Bagué, Silvia; Ribalta, Teresa; Alameda, Francesc; de la Iglesia, Nuria

    2017-01-01

    The molecular classification of glioblastoma (GBM) based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF) tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE) tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved. PMID:28122052

  8. A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms.

    PubMed

    Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M

    2017-12-06

    While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

  9. An empirical strategy to detect bacterial transcript structure from directional RNA-seq transcriptome data.

    PubMed

    Wang, Yejun; MacKenzie, Keith D; White, Aaron P

    2015-05-07

    As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for comparative analyses with other Salmonella serotypes.

  10. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments

    PubMed Central

    Maza, Elie; Frasse, Pierre; Senin, Pavel; Bouzayen, Mondher; Zouine, Mohamed

    2013-01-01

    In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods. PMID:26442135

  11. DTWscore: differential expression and cell clustering analysis for time-series single-cell RNA-seq data.

    PubMed

    Wang, Zhuo; Jin, Shuilin; Liu, Guiyou; Zhang, Xiurui; Wang, Nan; Wu, Deliang; Hu, Yang; Zhang, Chiping; Jiang, Qinghua; Xu, Li; Wang, Yadong

    2017-05-23

    The development of single-cell RNA sequencing has enabled profound discoveries in biology, ranging from the dissection of the composition of complex tissues to the identification of novel cell types and dynamics in some specialized cellular environments. However, the large-scale generation of single-cell RNA-seq (scRNA-seq) data collected at multiple time points remains a challenge to effective measurement gene expression patterns in transcriptome analysis. We present an algorithm based on the Dynamic Time Warping score (DTWscore) combined with time-series data, that enables the detection of gene expression changes across scRNA-seq samples and recovery of potential cell types from complex mixtures of multiple cell types. The DTWscore successfully classify cells of different types with the most highly variable genes from time-series scRNA-seq data. The study was confined to methods that are implemented and available within the R framework. Sample datasets and R packages are available at https://github.com/xiaoxiaoxier/DTWscore .

  12. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  13. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology.

    PubMed

    Mykles, Donald L; Burnett, Karen G; Durica, David S; Joyce, Blake L; McCarthy, Fiona M; Schmidt, Carl J; Stillman, Jonathon H

    2016-12-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the "Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology" symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  14. Isolation of Polysomal RNA for Analyzing Stress-Responsive Genes Regulated at the Translational Level in Plants.

    PubMed

    Li, Yong-Fang; Mahalingam, Ramamurthy; Sunkar, Ramanjulu

    2017-01-01

    Alteration of gene expression is an essential mechanism, which allows plants to respond and adapt to adverse environmental conditions. Transcriptome and proteome analyses in plants exposed to abiotic stresses revealed that protein levels are not correlated with the changes in corresponding mRNAs, indicating regulation at translational level is another major regulator for gene expression. Analysis of translatome, which refers to all mRNAs associated with ribosomes, thus has the potential to bridge the gap between transcriptome and proteome. Polysomal RNA profiling and recently developed ribosome profiling (Ribo-seq) are two main methods for translatome analysis at global level. Here, we describe the classical procedure for polysomal RNA isolation by sucrose gradient ultracentrifugation followed by highthroughput RNA-seq to identify genes regulated at translational level. Polysomal RNA can be further used for a variety of downstream applications including Northern blot analysis, qRT-PCR, RNase protection assay, and microarray-based gene expression profiling.

  15. Stranded Whole Transcriptome RNA-Seq for All RNA Types

    PubMed Central

    Yan, Pearlly X.; Fang, Fang; Buechlein, Aaron; Ford, James B.; Tang, Haixu; Huang, Tim H.; Burow, Matthew E.; Liu, Yunlong; Rusch, Douglas B.

    2015-01-01

    Stranded whole transcriptome RNA-Seq described in this unit captures quantitative expression data for all types of RNA including, but not limited to miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (large non-coding intergenic RNA), SRP RNA (signal recognition particle RNA), tRNA (transfer RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA). The size and nature of these types of RNA are irrelevant to the approach described here. Barcoded libraries for multiplexing on the Illumina platform are generated with this approach but it can be applied to other platforms with a few modifications. PMID:25599667

  16. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine.

    PubMed

    Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida

    2016-03-15

    Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.

  17. Time Series Expression Analyses Using RNA-seq: A Statistical Approach

    PubMed Central

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021

  18. Time series expression analyses using RNA-seq: a statistical approach.

    PubMed

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.

  19. Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction

    PubMed Central

    Eicher, John D.; Wakabayashi, Yoshiyuki; Vitseva, Olga; Esa, Nada; Yang, Yanqin; Zhu, Jun; Freedman, Jane E.; McManus, David D.; Johnson, Andrew D.

    2016-01-01

    Transcripts in platelets are largely produced in precursor megakaryocytes but remain physiologically-active as platelets translate RNAs and regulate protein/RNA levels. Recent studies using transcriptome sequencing (RNA-seq) characterized the platelet transcriptome in limited numbers of non-diseased individuals. Here, we expand upon these RNA-seq studies by completing RNA-seq in platelets from 32 patients with acute myocardial infarction (MI). Our goals were to characterize the platelet transcriptome using a population of patients with acute MI and relate gene expression to platelet aggregation measures and ST-segment elevation MI (STEMI) (n=16) versus non-STEMI (NSTEMI) (n=16) subtypes. Similar to other studies, we detected 9,565 expressed transcripts, including several known platelet-enriched markers (e.g., PPBP, OST4). Our RNA-seq data strongly correlated with independently ascertained platelet expression data and showed enrichment for platelet-related pathways (e.g., wound response, hemostasis, and platelet activation), as well as actin-related and post-transcriptional processes. Several transcripts displayed suggestively higher (FBXL4, ECHDC3, KCNE1, TAOK2, AURKB, ERG, and FKBP5) and lower (MIAT, PVRL3and PZP) expression in STEMI platelets compared to NSTEMI. We also identified transcripts correlated with platelet aggregation to TRAP (ATP6V1G2, SLC2A3), collagen (CEACAM1, ITGA2), and ADP (PDGFB, PDGFC, ST3GAL6). Our study adds to current platelet gene expression resources by providing transcriptome-wide analyses in platelets isolated from patients with acute MI. In concert with prior studies, we identify various genes for further study in regards to platelet function and acute MI. Future platelet RNA-seq studies examining more diverse sets of healthy and diseased samples will add to our understanding of platelet thrombotic and non-thrombotic functions. PMID:26367242

  20. Transcriptome In Vivo Analysis (TIVA) of spatially defined single cells in intact live mouse and human brain tissue

    PubMed Central

    Lovatt, Ditte; Ruble, Brittani K.; Lee, Jaehee; Dueck, Hannah; Kim, Tae Kyung; Fisher, Stephen; Francis, Chantal; Spaethling, Jennifer M.; Wolf, John A.; Grady, M. Sean; Ulyanova, Alexandra V.; Yeldell, Sean B.; Griepenburg, Julianne C.; Buckley, Peter T.; Kim, Junhyong; Sul, Jai-Yoon; Dmochowski, Ivan J.; Eberwine, James

    2014-01-01

    Transcriptome profiling is an indispensable tool in advancing the understanding of single cell biology, but depends upon methods capable of isolating mRNA at the spatial resolution of a single cell. Current capture methods lack sufficient spatial resolution to isolate mRNA from individual in vivo resident cells without damaging adjacent tissue. Because of this limitation, it has been difficult to assess the influence of the microenvironment on the transcriptome of individual neurons. Here, we engineered a Transcriptome In Vivo Analysis (TIVA)-tag, which upon photoactivation enables mRNA capture from single cells in live tissue. Using the TIVA-tag in combination with RNA-seq to analyze transcriptome variance among single dispersed cells and in vivo resident mouse and human neurons, we show that the tissue microenvironment shapes the transcriptomic landscape of individual cells. The TIVA methodology provides the first noninvasive approach for capturing mRNA from single cells in their natural microenvironment. PMID:24412976

  1. Integrating single-cell transcriptomic data across different conditions, technologies, and species.

    PubMed

    Butler, Andrew; Hoffman, Paul; Smibert, Peter; Papalexi, Efthymia; Satija, Rahul

    2018-06-01

    Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

  2. Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.

    PubMed

    Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A

    2017-11-01

    To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.

  3. Insights into organ-specific pathogen defense responses in plants: RNA-seq analysis of potato tuber-Phytophthora infestans interactions.

    PubMed

    Gao, Liangliang; Tu, Zheng Jin; Millett, Benjamin P; Bradeen, James M

    2013-05-23

    The late blight pathogen Phytophthora infestans can attack both potato foliage and tubers. Although interaction transcriptome dynamics between potato foliage and various pathogens have been reported, no transcriptome study has focused specifically upon how potato tubers respond to pathogen infection. When inoculated with P. infestans, tubers of nontransformed 'Russet Burbank' (WT) potato develop late blight disease while those of transgenic 'Russet Burbank' line SP2211 (+RB), which expresses the potato late blight resistance gene RB (Rpi-blb1), do not. We compared transcriptome responses to P. infestans inoculation in tubers of these two lines. We demonstrated the practicality of RNA-seq to study tetraploid potato and present the first RNA-seq study of potato tuber diseases. A total of 483 million paired end Illumina RNA-seq reads were generated, representing the transcription of around 30,000 potato genes. Differentially expressed genes, gene groups and ontology bins that exhibited differences between the WT and +RB lines were identified. P. infestans transcripts, including those of known effectors, were also identified. Faster and stronger activation of defense related genes, gene groups and ontology bins correlate with successful tuber resistance against P. infestans. Our results suggest that the hypersensitive response is likely a general form of resistance against the hemibiotrophic P. infestans-even in potato tubers, organs that develop below ground.

  4. Insights into organ-specific pathogen defense responses in plants: RNA-seq analysis of potato tuber-Phytophthora infestans interactions

    PubMed Central

    2013-01-01

    Background The late blight pathogen Phytophthora infestans can attack both potato foliage and tubers. Although interaction transcriptome dynamics between potato foliage and various pathogens have been reported, no transcriptome study has focused specifically upon how potato tubers respond to pathogen infection. When inoculated with P. infestans, tubers of nontransformed ‘Russet Burbank’ (WT) potato develop late blight disease while those of transgenic ‘Russet Burbank’ line SP2211 (+RB), which expresses the potato late blight resistance gene RB (Rpi-blb1), do not. We compared transcriptome responses to P. infestans inoculation in tubers of these two lines. Results We demonstrated the practicality of RNA-seq to study tetraploid potato and present the first RNA-seq study of potato tuber diseases. A total of 483 million paired end Illumina RNA-seq reads were generated, representing the transcription of around 30,000 potato genes. Differentially expressed genes, gene groups and ontology bins that exhibited differences between the WT and +RB lines were identified. P. infestans transcripts, including those of known effectors, were also identified. Conclusion Faster and stronger activation of defense related genes, gene groups and ontology bins correlate with successful tuber resistance against P. infestans. Our results suggest that the hypersensitive response is likely a general form of resistance against the hemibiotrophic P. infestans—even in potato tubers, organs that develop below ground. PMID:23702331

  5. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data

    PubMed Central

    Gardeux, Vincent; David, Fabrice P. A.; Shajkofci, Adrian; Schwalie, Petra C.; Deplancke, Bart

    2017-01-01

    Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. Results We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. Availability and implementation The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. Contact bart.deplancke@epfl.ch Supplementary information Supplementary data are available at Bioinformatics online. PMID:28541377

  6. ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data.

    PubMed

    Gardeux, Vincent; David, Fabrice P A; Shajkofci, Adrian; Schwalie, Petra C; Deplancke, Bart

    2017-10-01

    Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. bart.deplancke@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  7. Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

    PubMed Central

    Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv

    2010-01-01

    RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462

  8. Comparison of software packages for detecting differential expression in RNA-seq studies

    PubMed Central

    Seyednasrollah, Fatemeh; Laiho, Asta

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110

  9. Comparison of software packages for detecting differential expression in RNA-seq studies.

    PubMed

    Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.

  10. Transcriptome Analysis of Flounder (Paralichthys olivaceus) Gill in Response to Lymphocystis Disease Virus (LCDV) Infection: Novel Insights into Fish Defense Mechanisms

    PubMed Central

    Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin

    2018-01-01

    Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder (Paralichthys olivaceus) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin, toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV. PMID:29304016

  11. Transcriptome Analysis of Flounder (Paralichthys olivaceus) Gill in Response to Lymphocystis Disease Virus (LCDV) Infection: Novel Insights into Fish Defense Mechanisms.

    PubMed

    Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin

    2018-01-05

    Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder ( Paralichthys olivaceus ) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin , toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV.

  12. De novo transcriptomic analysis of hydrogen production in the green alga Chlamydomonas moewusii through RNA-Seq

    PubMed Central

    2013-01-01

    Background Microalgae can make a significant contribution towards meeting global renewable energy needs in both carbon-based and hydrogen (H2) biofuel. The development of energy-related products from algae could be accelerated with improvements in systems biology tools, and recent advances in sequencing technology provide a platform for enhanced transcriptomic analyses. However, these techniques are still heavily reliant upon available genomic sequence data. Chlamydomonas moewusii is a unicellular green alga capable of evolving molecular H2 under both dark and light anaerobic conditions, and has high hydrogenase activity that can be rapidly induced. However, to date, there is no systematic investigation of transcriptomic profiling during induction of H2 photoproduction in this organism. Results In this work, RNA-Seq was applied to investigate transcriptomic profiles during the dark anaerobic induction of H2 photoproduction. 156 million reads generated from 7 samples were then used for de novo assembly after data trimming. BlastX results against NCBI database and Blast2GO results were used to interpret the functions of the assembled 34,136 contigs, which were then used as the reference contigs for RNA-Seq analysis. Our results indicated that more contigs were differentially expressed during the period of early and higher H2 photoproduction, and fewer contigs were differentially expressed when H2-photoproduction rates decreased. In addition, C. moewusii and C. reinhardtii share core functional pathways, and transcripts for H2 photoproduction and anaerobic metabolite production were identified in both organisms. C. moewusii also possesses similar metabolic flexibility as C. reinhardtii, and the difference between C. moewusii and C. reinhardtii on hydrogenase expression and anaerobic fermentative pathways involved in redox balancing may explain their different profiles of hydrogenase activity and secreted anaerobic metabolites. Conclusions Herein, we have described a workflow using commercial software to analyze RNA-Seq data without reference genome sequence information, which can be applied to other unsequenced microorganisms. This study provided biological insights into the anaerobic fermentation and H2 photoproduction of C. moewusii, and the first transcriptomic RNA-Seq dataset of C. moewusii generated in this study also offer baseline data for further investigation (e.g. regulatory proteins related to fermentative pathway discussed in this study) of this organism as a H2-photoproduction strain. PMID:23971877

  13. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    USDA-ARS?s Scientific Manuscript database

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  14. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.

    PubMed

    Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D

    2015-07-30

    RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.

  15. Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

    PubMed

    Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami

    2018-01-19

    Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .

  16. RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes.

    PubMed

    Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian

    2014-09-01

    RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.

  17. Dose-Response Analysis of RNA-Seq Profiles in Archival Formalin-Fixed Paraffin-Embedded (FFPE) Samples.

    EPA Science Inventory

    Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses us...

  18. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq

    PubMed Central

    Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng

    2011-01-01

    Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387

  19. TruSeq Stranded mRNA and Total RNA Sample Preparation Kits

    Cancer.gov

    Total RNA-Seq enabled by ribosomal RNA (rRNA) reduction is compatible with formalin-fixed paraffin embedded (FFPE) samples, which contain potentially critical biological information. The family of TruSeq Stranded Total RNA sample preparation kits provides a unique combination of unmatched data quality for both mRNA and whole-transcriptome analyses, robust interrogation of both standard and low-quality samples and workflows compatible with a wide range of study designs.

  20. Replicates, read numbers, and other important experimental design considerations for microbial RNA-seq identified using Bacillus thuringiensis datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less

  1. Replicates, read numbers, and other important experimental design considerations for microbial RNA-seq identified using Bacillus thuringiensis datasets

    DOE PAGES

    Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.; ...

    2016-05-31

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less

  2. Replicates, Read Numbers, and Other Important Experimental Design Considerations for Microbial RNA-seq Identified Using Bacillus thuringiensis Datasets.

    PubMed

    Manga, Punita; Klingeman, Dawn M; Lu, Tse-Yuan S; Mehlhorn, Tonia L; Pelletier, Dale A; Hauser, Loren J; Wilson, Charlotte M; Brown, Steven D

    2016-01-01

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, which were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). This study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.

  3. Advances in single-cell RNA sequencing and its applications in cancer research.

    PubMed

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-08-08

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.

  4. Advances in single-cell RNA sequencing and its applications in cancer research

    PubMed Central

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-01-01

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849

  5. SEASTAR: systematic evaluation of alternative transcription start sites in RNA.

    PubMed

    Qin, Zhiyi; Stoilov, Peter; Zhang, Xuegong; Xing, Yi

    2018-05-04

    Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.

  6. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    PubMed Central

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  7. RNA-seq analysis of broiler liver transcriptome reveals novel responses to high ambient temperature.

    PubMed

    Coble, Derrick J; Fleming, Damarius; Persia, Michael E; Ashwell, Chris M; Rothschild, Max F; Schmidt, Carl J; Lamont, Susan J

    2014-12-10

    In broilers, high ambient temperature can result in reduced feed consumption, digestive inefficiency, impaired metabolism, and even death. The broiler sector of the U.S. poultry industry incurs approximately $52 million in heat-related losses annually. The objective of this study is to characterize the effects of cyclic high ambient temperature on the transcriptome of a metabolically active organ, the liver. This study provides novel insight into the effects of high ambient temperature on metabolism in broilers, because it is the first reported RNA-seq study to characterize the effect of heat on the transcriptome of a metabolic-related tissue. This information provides a platform for future investigations to further elucidate physiologic responses to high ambient temperature and seek methods to ameliorate the negative impacts of heat. Transcriptome sequencing of the livers of 8 broiler males using Illumina HiSeq 2000 technology resulted in 138 million, 100-base pair single end reads, yielding a total of 13.8 gigabases of sequence. Forty genes were differentially expressed at a significance level of P-value < 0.05 and a fold-change ≥ 2 in response to a week of cyclic high ambient temperature with 27 down-regulated and 13 up-regulated genes. Two gene networks were created from the function-based Ingenuity Pathway Analysis (IPA) of the differentially expressed genes: "Cell Signaling" and "Endocrine System Development and Function". The gene expression differences in the liver transcriptome of the heat-exposed broilers reflected physiological responses to decrease internal temperature, reduce hyperthermia-induced apoptosis, and promote tissue repair. Additionally, the differential gene expression revealed a physiological response to regulate the perturbed cellular calcium levels that can result from high ambient temperature exposure. Exposure to cyclic high ambient temperature results in changes at the metabolic, physiologic, and cellular level that can be characterized through RNA-seq analysis of the liver transcriptome of broilers. The findings highlight specific physiologic mechanisms by which broilers reduce the effects of exposure to high ambient temperature. This information provides a foundation for future investigations into the gene networks involved in the broiler stress response and for development of strategies to ameliorate the negative impacts of heat on animal production and welfare.

  8. BLIND ordering of large-scale transcriptomic developmental timecourses.

    PubMed

    Anavy, Leon; Levin, Michal; Khair, Sally; Nakanishi, Nagayasu; Fernandez-Valverde, Selene L; Degnan, Bernard M; Yanai, Itai

    2014-03-01

    RNA-Seq enables the efficient transcriptome sequencing of many samples from small amounts of material, but the analysis of these data remains challenging. In particular, in developmental studies, RNA-Seq is challenged by the morphological staging of samples, such as embryos, since these often lack clear markers at any particular stage. In such cases, the automatic identification of the stage of a sample would enable previously infeasible experimental designs. Here we present the 'basic linear index determination of transcriptomes' (BLIND) method for ordering samples comprising different developmental stages. The method is an implementation of a traveling salesman algorithm to order the transcriptomes according to their inter-relationships as defined by principal components analysis. To establish the direction of the ordered samples, we show that an appropriate indicator is the entropy of transcriptomic gene expression levels, which increases over developmental time. Using BLIND, we correctly recover the annotated order of previously published embryonic transcriptomic timecourses for frog, mosquito, fly and zebrafish. We further demonstrate the efficacy of BLIND by collecting 59 embryos of the sponge Amphimedon queenslandica and ordering their transcriptomes according to developmental stage. BLIND is thus useful in establishing the temporal order of samples within large datasets and is of particular relevance to the study of organisms with asynchronous development and when morphological staging is difficult.

  9. Integrated analyses using RNA-Seq data reveal viral genomes, single nucleotide variations, the phylogenetic relationship, and recombination for Apple stem grooving virus.

    PubMed

    Jo, Yeonhwa; Choi, Hoseong; Kim, Sang-Min; Kim, Sun-Lim; Lee, Bong Choon; Cho, Won Kyong

    2016-08-09

    Next-generation sequencing (NGS) provides many possibilities for plant virology research. In this study, we performed integrated analyses using plant transcriptome data for plant virus identification using Apple stem grooving virus (ASGV) as an exemplar virus. We used 15 publicly available transcriptome libraries from three different studies, two mRNA-Seq studies and a small RNA-Seq study. We de novo assembled nearly complete genomes of ASGV isolates Fuji and Cuiguan from apple and pear transcriptomes, respectively, and identified single nucleotide variations (SNVs) of ASGV within the transcriptomes. We demonstrated the application of NGS raw data to confirm viral infections in the plant transcriptomes. In addition, we compared the usability of two de novo assemblers, Trinity and Velvet, for virus identification and genome assembly. A phylogenetic tree revealed that ASGV and Citrus tatter leaf virus (CTLV) are the same virus, which was divided into two clades. Recombination analyses identified six recombination events from 21 viral genomes. Taken together, our in silico analyses using NGS data provide a successful application of plant transcriptomes to reveal extensive information associated with viral genome assembly, SNVs, phylogenetic relationships, and genetic recombination.

  10. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  11. Global Analysis of Transcriptome Responses and Gene Expression Profiles to Cold Stress of Jatropha curcas L.

    PubMed Central

    Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming

    2013-01-01

    Background Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. Results In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. Conclusions This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas. PMID:24349370

  12. Global analysis of transcriptome responses and gene expression profiles to cold stress of Jatropha curcas L.

    PubMed

    Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming

    2013-01-01

    Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas.

  13. Comprehensive RNA-Seq Expression Analysis of Sensory Ganglia with a Focus on Ion Channels and GPCRs in Trigeminal Ganglia

    PubMed Central

    Manteniotis, Stavros; Lehmann, Ramona; Flegel, Caroline; Vogel, Felix; Hofreuter, Adrian; Schreiner, Benjamin S. P.; Altmüller, Janine; Becker, Christian; Schöbel, Nicole; Hatt, Hanns; Gisselmann, Günter

    2013-01-01

    The specific functions of sensory systems depend on the tissue-specific expression of genes that code for molecular sensor proteins that are necessary for stimulus detection and membrane signaling. Using the Next Generation Sequencing technique (RNA-Seq), we analyzed the complete transcriptome of the trigeminal ganglia (TG) and dorsal root ganglia (DRG) of adult mice. Focusing on genes with an expression level higher than 1 FPKM (fragments per kilobase of transcript per million mapped reads), we detected the expression of 12984 genes in the TG and 13195 in the DRG. To analyze the specific gene expression patterns of the peripheral neuronal tissues, we compared their gene expression profiles with that of the liver, brain, olfactory epithelium, and skeletal muscle. The transcriptome data of the TG and DRG were scanned for virtually all known G-protein-coupled receptors (GPCRs) as well as for ion channels. The expression profile was ranked with regard to the level and specificity for the TG. In total, we detected 106 non-olfactory GPCRs and 33 ion channels that had not been previously described as expressed in the TG. To validate the RNA-Seq data, in situ hybridization experiments were performed for several of the newly detected transcripts. To identify differences in expression profiles between the sensory ganglia, the RNA-Seq data of the TG and DRG were compared. Among the differentially expressed genes (> 1 FPKM), 65 and 117 were expressed at least 10-fold higher in the TG and DRG, respectively. Our transcriptome analysis allows a comprehensive overview of all ion channels and G protein-coupled receptors that are expressed in trigeminal ganglia and provides additional approaches for the investigation of trigeminal sensing as well as for the physiological and pathophysiological mechanisms of pain. PMID:24260241

  14. RNA-seq reveals more consistent reference genes for gene expression studies in human non-melanoma skin cancers

    PubMed Central

    Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter

    2017-01-01

    Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586

  15. Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki)

    PubMed Central

    2013-01-01

    Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455

  16. Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki).

    PubMed

    Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian

    2013-11-09

    The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.

  17. Genome improvement of the acarbose producer Actinoplanes sp. SE50/110 and annotation refinement based on RNA-seq analysis.

    PubMed

    Wolf, Timo; Schneiker-Bekel, Susanne; Neshat, Armin; Ortseifen, Vera; Wibberg, Daniel; Zemke, Till; Pühler, Alfred; Kalinowski, Jörn

    2017-06-10

    Actinoplanes sp. SE50/110 is the natural producer of acarbose, which is used in the treatment of diabetes mellitus type II. However, until now the transcriptional organization and regulation of the acarbose biosynthesis are only understood rudimentarily. The genome sequence of Actinoplanes sp. SE50/110 was known before, but was resequenced in this study to remove assembly artifacts and incorrect base callings. The annotation of the genome was refined in a multi-step approach, including modern bioinformatic pipelines, transcriptome and proteome data. A whole transcriptome RNA-seq library as well as an RNA-seq library enriched for primary 5'-ends were used for the detection of transcription start sites, to correct tRNA predictions, to identify novel transcripts like small RNAs and to improve the annotation through the correction of falsely annotated translation start sites. The transcriptome data sets were also applied to identify 31 cis-regulatory RNA structures, such as riboswitches or RNA thermometers as well as three leaderless transcribed short peptides found in putative attenuators upstream of genes for amino acid biosynthesis. The transcriptional organization of the acarbose biosynthetic gene cluster was elucidated in detail and fourteen novel biosynthetic gene clusters were suggested. The accurate genome sequence and precise annotation of the Actinoplanes sp. SE50/110 genome will be the foundation for future genetic engineering and systems biology studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

    PubMed Central

    Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676

  19. Spatial transcriptomic analysis of cryosectioned tissue samples with Geo-seq.

    PubMed

    Chen, Jun; Suo, Shengbao; Tam, Patrick Pl; Han, Jing-Dong J; Peng, Guangdun; Jing, Naihe

    2017-03-01

    Conventional gene expression studies analyze multiple cells simultaneously or single cells, for which the exact in vivo or in situ position is unknown. Although cellular heterogeneity can be discerned when analyzing single cells, any spatially defined attributes that underpin the heterogeneous nature of the cells cannot be identified. Here, we describe how to use Geo-seq, a method that combines laser capture microdissection (LCM) and single-cell RNA-seq technology. The combination of these two methods enables the elucidation of cellular heterogeneity and spatial variance simultaneously. The Geo-seq protocol allows the profiling of transcriptome information from only a small number cells and retains their native spatial information. This protocol has wide potential applications to address biological and pathological questions of cellular properties such as prospective cell fates, biological function and the gene regulatory network. Geo-seq has been applied to investigate the spatial transcriptome of mouse early embryo, mouse brain, and pathological liver and sperm tissues. The entire protocol from tissue collection and microdissection to sequencing requires ∼5 d, Data analysis takes another 1 or 2 weeks, depending on the amount of data and the speed of the processor.

  20. Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

    PubMed

    Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

    2018-04-24

    mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

  1. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data

    PubMed Central

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie

    2018-01-01

    Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630

  2. DrImpute: imputing dropout events in single cell RNA sequencing data.

    PubMed

    Gong, Wuming; Kwak, Il-Youp; Pota, Pruthvi; Koyano-Nakagawa, Naoko; Garry, Daniel J

    2018-06-08

    The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets. DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .

  3. TSSAR: TSS annotation regime for dRNA-seq data.

    PubMed

    Amman, Fabian; Wolfinger, Michael T; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F; Findeiß, Sven

    2014-03-27

    Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.

  4. De-novo assembly and characterization of the transcriptome of Metschnikowia fructicola reveals differences in gene expression following interaction with Penicillium digitatum and grapefruit peel

    USDA-ARS?s Scientific Manuscript database

    The yeast, Metschnikowia fructicola, is an antagonist with biological control activity against postharvest diseases of several fruits. We performed a transcriptome analysis, using RNA-Seq technology, to examine the response of M. fructicola with citrus fruit and with the postharvest pathogen, Penic...

  5. Identification of innate lymphoid cells in single-cell RNA-Seq data.

    PubMed

    Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David

    2017-07-01

    Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.

  6. Transcriptome Analysis of Orbital Adipose Tissue in Active Thyroid Eye Disease Using Next Generation RNA Sequencing Technology

    PubMed Central

    Lee, Bradford W.; Kumar, Virender B.; Biswas, Pooja; Ko, Audrey C.; Alameddine, Ramzi M.; Granet, David B.; Ayyagari, Radha; Kikkawa, Don O.; Korn, Bobby S.

    2018-01-01

    Objective: This study utilized Next Generation Sequencing (NGS) to identify differentially expressed transcripts in orbital adipose tissue from patients with active Thyroid Eye Disease (TED) versus healthy controls. Method: This prospective, case-control study enrolled three patients with severe, active thyroid eye disease undergoing orbital decompression, and three healthy controls undergoing routine eyelid surgery with removal of orbital fat. RNA Sequencing (RNA-Seq) was performed on freshly obtained orbital adipose tissue from study patients to analyze the transcriptome. Bioinformatics analysis was performed to determine pathways and processes enriched for the differential expression profile. Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR) was performed to validate the differential expression of selected genes identified by RNA-Seq. Results: RNA-Seq identified 328 differentially expressed genes associated with active thyroid eye disease, many of which were responsible for mediating inflammation, cytokine signaling, adipogenesis, IGF-1 signaling, and glycosaminoglycan binding. The IL-5 and chemokine signaling pathways were highly enriched, and very-low-density-lipoprotein receptor activity and statin medications were implicated as having a potential role in TED. Conclusion: This study is the first to use RNA-Seq technology to elucidate differential gene expression associated with active, severe TED. This study suggests a transcriptional basis for the role of statins in modulating differentially expressed genes that mediate the pathogenesis of thyroid eye disease. Furthermore, the identification of genes with altered levels of expression in active, severe TED may inform the molecular pathways central to this clinical phenotype and guide the development of novel therapeutic agents. PMID:29760827

  7. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    USDA-ARS?s Scientific Manuscript database

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  8. Dual Analysis of the Murine Cytomegalovirus and Host Cell Transcriptomes Reveal New Aspects of the Virus-Host Cell Interface

    PubMed Central

    Juranic Lisnic, Vanda; Babic Cac, Marina; Lisnic, Berislav; Trsan, Tihana; Mefferd, Adam; Das Mukhopadhyay, Chitrangada; Cook, Charles H.; Jonjic, Stipan; Trgovcich, Joanne

    2013-01-01

    Major gaps in our knowledge of pathogen genes and how these gene products interact with host gene products to cause disease represent a major obstacle to progress in vaccine and antiviral drug development for the herpesviruses. To begin to bridge these gaps, we conducted a dual analysis of Murine Cytomegalovirus (MCMV) and host cell transcriptomes during lytic infection. We analyzed the MCMV transcriptome during lytic infection using both classical cDNA cloning and sequencing of viral transcripts and next generation sequencing of transcripts (RNA-Seq). We also investigated the host transcriptome using RNA-Seq combined with differential gene expression analysis, biological pathway analysis, and gene ontology analysis. We identify numerous novel spliced and unspliced transcripts of MCMV. Unexpectedly, the most abundantly transcribed viral genes are of unknown function. We found that the most abundant viral transcript, recently identified as a noncoding RNA regulating cellular microRNAs, also codes for a novel protein. To our knowledge, this is the first viral transcript that functions both as a noncoding RNA and an mRNA. We also report that lytic infection elicits a profound cellular response in fibroblasts. Highly upregulated and induced host genes included those involved in inflammation and immunity, but also many unexpected transcription factors and host genes related to development and differentiation. Many top downregulated and repressed genes are associated with functions whose roles in infection are obscure, including host long intergenic noncoding RNAs, antisense RNAs or small nucleolar RNAs. Correspondingly, many differentially expressed genes cluster in biological pathways that may shed new light on cytomegalovirus pathogenesis. Together, these findings provide new insights into the molecular warfare at the virus-host interface and suggest new areas of research to advance the understanding and treatment of cytomegalovirus-associated diseases. PMID:24086132

  9. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

    PubMed

    Song, Li; Florea, Liliana

    2015-01-01

    Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

  10. A transcriptome resource for the Antarctic pteropod Limacina helicina antarctica.

    PubMed

    Johnson, Kevin M; Hofmann, Gretchen E

    2016-08-01

    The pteropod Limacina helicina antarctica is a dominant member of the zooplankton assemblage in the Antarctic marine ecosystem, and is part of a relatively simple food web in nearshore marine Antarctic waters. As a shelled pteropod, Limacina has been suggested as a candidate sentinel organism for the impacts of ocean acidification, due to the potential for shell dissolution in undersaturated waters. In this study, our goal was to develop a transcriptomic resource for Limacina that would support mechanistic studies to explore the physiological response of Limacina to abiotic stressors such as ocean acidification and ocean warming. To this end, RNA sequencing libraries were prepared from Limacina that had been exposed to a range of pH levels and an elevated temperature to maximize the diversity of expressed genes. RNA sequencing (RNA-seq) was conducted on an Illumina NextSeq500 which produced 339,000,000 150bp paired-end reads. The de novo transcriptome was produced using Trinity and annotation of the assembled transcriptome resulted in the identification of 81,229 transcripts in 137 KEGG pathways. This RNA-seq effort resulted in a transcriptome for the Antarctic pteropod, Limacina helicina antarctica, that is a major resource for an international marine science research community studying these pelagic molluscs in a global change context. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Transcriptome Dynamics during Maize Endosperm Development

    PubMed Central

    Feng, Jiaojiao; Xu, Shutu; Wang, Lei; Li, Feifei; Li, Yibo; Zhang, Renhe; Zhang, Xinghua; Xue, Jiquan; Guo, Dongwei

    2016-01-01

    The endosperm is a major organ of the seed that plays vital roles in determining seed weight and quality. However, genome-wide transcriptome patterns throughout maize endosperm development have not been comprehensively investigated to date. Accordingly, we performed a high-throughput RNA sequencing (RNA-seq) analysis of the maize endosperm transcriptome at 5, 10, 15 and 20 days after pollination (DAP). We found that more than 11,000 protein-coding genes underwent alternative splicing (AS) events during the four developmental stages studied. These genes were mainly involved in intracellular protein transport, signal transmission, cellular carbohydrate metabolism, cellular lipid metabolism, lipid biosynthesis, protein modification, histone modification, cellular amino acid metabolism, and DNA repair. Additionally, 7,633 genes, including 473 transcription factors (TFs), were differentially expressed among the four developmental stages. The differentially expressed TFs were from 50 families, including the bZIP, WRKY, GeBP and ARF families. Further analysis of the stage-specific TFs showed that binding, nucleus and ligand-dependent nuclear receptor activities might be important at 5 DAP, that immune responses, signalling, binding and lumen development are involved at 10 DAP, that protein metabolic processes and the cytoplasm might be important at 15 DAP, and that the responses to various stimuli are different at 20 DAP compared with the other developmental stages. This RNA-seq analysis provides novel, comprehensive insights into the transcriptome dynamics during early endosperm development in maize. PMID:27695101

  12. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing.

    PubMed

    Marinov, Georgi K; Williams, Brian A; McCue, Ken; Schroth, Gary P; Gertz, Jason; Myers, Richard M; Wold, Barbara J

    2014-03-01

    Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.

  13. Recovering complete mitochondrial genome sequences from RNA-Seq: A case study of Polytomella non-photosynthetic green algae.

    PubMed

    Tian, Yao; Smith, David Roy

    2016-05-01

    Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  14. Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data

    PubMed Central

    Kim, Taemook; Seo, Hogyu David; Hennighausen, Lothar; Lee, Daeyoup

    2018-01-01

    Abstract Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data. PMID:29420797

  15. Comprehensive RNA-Seq transcriptomic profiling across 11 organs, 4 ages, and 2 sexes of Fischer 344 rats.

    PubMed

    Yu, Ying; Zhao, Chen; Su, Zhenqiang; Wang, Charles; Fuscoe, James C; Tong, Weida; Shi, Leming

    2014-01-01

    The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.

  16. RNA-SeQC: RNA-seq metrics for quality control and process optimization.

    PubMed

    DeLuca, David S; Levin, Joshua Z; Sivachenko, Andrey; Fennell, Timothy; Nazaire, Marc-Danie; Williams, Chris; Reich, Michael; Winckler, Wendy; Getz, Gad

    2012-06-01

    RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.

  17. Transcriptome analysis of root response to citrus blight based on the newly assembled Swingle citrumelo draft genome.

    PubMed

    Zhang, Yunzeng; Barthe, Gary; Grosser, Jude W; Wang, Nian

    2016-07-08

    Citrus blight is a citrus tree overall decline disease and causes serious losses in the citrus industry worldwide. Although it was described more than one hundred years ago, its causal agent remains unknown and its pathophysiology is not well determined, which hampers our understanding of the disease and design of suitable disease management. In this study, we sequenced and assembled the draft genome for Swingle citrumelo, one important citrus rootstock. The draft genome is approximately 280 Mb, which covers 74 % of the estimated Swingle citrumelo genome and the average coverage is around 15X. The draft genome of Swingle citrumelo enabled us to conduct transcriptome analysis of roots of blight and healthy Swingle citrumelo using RNA-seq. The RNA-seq was reliable as evidenced by the high consistence of RNA-seq analysis and quantitative reverse transcription PCR results (R(2) = 0.966). Comparison of the gene expression profiles between blight and healthy root samples revealed the molecular mechanism underneath the characteristic blight phenotypes including decline, starch accumulation, and drought stress. The JA and ET biosynthesis and signaling pathways showed decreased transcript abundance, whereas SA-mediated defense-related genes showed increased transcript abundance in blight trees, suggesting unclassified biotrophic pathogen was involved in this disease. Overall, the Swingle citrumelo draft genome generated in this study will advance our understanding of plant biology and contribute to the citrus breeding. Transcriptome analysis of blight and healthy trees deepened our understanding of the pathophysiology of citrus blight.

  18. Unique transcriptomic signature of omental adipose tissue in Ossabaw swine: a model of childhood obesity.

    PubMed

    Toedebusch, Ryan G; Roberts, Michael D; Wells, Kevin D; Company, Joseph M; Kanosky, Kayla M; Padilla, Jaume; Jenkins, Nathan T; Perfield, James W; Ibdah, Jamal A; Booth, Frank W; Rector, R Scott

    2014-05-15

    To better understand the impact of childhood obesity on intra-abdominal adipose tissue phenotype, a complete transcriptomic analysis using deep RNA-sequencing (RNA-seq) was performed on omental adipose tissue (OMAT) obtained from lean and Western diet-induced obese juvenile Ossabaw swine. Obese animals had 88% greater body mass, 49% greater body fat content, and a 60% increase in OMAT adipocyte area (all P < 0.05) compared with lean pigs. RNA-seq revealed a 37% increase in the total transcript number in the OMAT of obese pigs. Ingenuity Pathway Analysis showed transcripts in obese OMAT were primarily enriched in the following categories: 1) development, 2) cellular function and maintenance, and 3) connective tissue development and function, while transcripts associated with RNA posttranslational modification, lipid metabolism, and small molecule biochemistry were reduced. DAVID and Gene Ontology analyses showed that many of the classically recognized gene pathways associated with adipose tissue dysfunction in obese adults including hypoxia, inflammation, angiogenesis were not altered in OMAT in our model. The current study indicates that obesity in juvenile Ossabaw swine is characterized by increases in overall OMAT transcript number and provides novel data describing early transcriptomic alterations that occur in response to excess caloric intake in visceral adipose tissue in a pig model of childhood obesity.

  19. Unique transcriptomic signature of omental adipose tissue in Ossabaw swine: a model of childhood obesity

    PubMed Central

    Toedebusch, Ryan G.; Roberts, Michael D.; Wells, Kevin D.; Company, Joseph M.; Kanosky, Kayla M.; Padilla, Jaume; Jenkins, Nathan T.; Perfield, James W.; Ibdah, Jamal A.; Booth, Frank W.

    2014-01-01

    To better understand the impact of childhood obesity on intra-abdominal adipose tissue phenotype, a complete transcriptomic analysis using deep RNA-sequencing (RNA-seq) was performed on omental adipose tissue (OMAT) obtained from lean and Western diet-induced obese juvenile Ossabaw swine. Obese animals had 88% greater body mass, 49% greater body fat content, and a 60% increase in OMAT adipocyte area (all P < 0.05) compared with lean pigs. RNA-seq revealed a 37% increase in the total transcript number in the OMAT of obese pigs. Ingenuity Pathway Analysis showed transcripts in obese OMAT were primarily enriched in the following categories: 1) development, 2) cellular function and maintenance, and 3) connective tissue development and function, while transcripts associated with RNA posttranslational modification, lipid metabolism, and small molecule biochemistry were reduced. DAVID and Gene Ontology analyses showed that many of the classically recognized gene pathways associated with adipose tissue dysfunction in obese adults including hypoxia, inflammation, angiogenesis were not altered in OMAT in our model. The current study indicates that obesity in juvenile Ossabaw swine is characterized by increases in overall OMAT transcript number and provides novel data describing early transcriptomic alterations that occur in response to excess caloric intake in visceral adipose tissue in a pig model of childhood obesity. PMID:24642759

  20. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow

    PubMed Central

    Buschmann, Dominik; Haberberger, Anna; Kirchner, Benedikt; Spornraft, Melanie; Riedmaier, Irmgard; Schelling, Gustav; Pfaffl, Michael W.

    2016-01-01

    Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions. PMID:27317696

  1. Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding.

    PubMed

    Agarwal, Pinky; Parida, Swarup K; Mahto, Arunima; Das, Sweta; Mathew, Iny Elizebeth; Malik, Naveen; Tyagi, Akhilesh K

    2014-12-01

    The transcript pool of a plant part, under any given condition, is a collection of mRNAs that will pave the way for a biochemical reaction of the plant to stimuli. Over the past decades, transcriptome study has advanced from Northern blotting to RNA sequencing (RNA-seq), through other techniques, of which real-time quantitative polymerase chain reaction (PCR) and microarray are the most significant ones. The questions being addressed by such studies have also matured from a solitary process to expression atlas and marker-assisted genetic enhancement. Not only genes and their networks involved in various developmental processes of plant parts have been elucidated, but also stress tolerant genes have been highlighted. The transcriptome of a plant with altered expression of a target gene has given information about the downstream genes. Marker information has been used for breeding improved varieties. Fortunately, the data generated by transcriptome analysis has been made freely available for ample utilization and comparison. The review discusses this wide variety of transcriptome data being generated in plants, which includes developmental stages, abiotic and biotic stress, effect of altered gene expression, as well as comparative transcriptomics, with a special emphasis on microarray and RNA-seq. Such data can be used to determine the regulatory gene networks, which can subsequently be utilized for generating improved plant varieties. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. seq-ImmuCC: Cell-Centric View of Tissue Transcriptome Measuring Cellular Compositions of Immune Microenvironment From Mouse RNA-Seq Data.

    PubMed

    Chen, Ziyi; Quan, Lijun; Huang, Anfei; Zhao, Qiang; Yuan, Yao; Yuan, Xuye; Shen, Qin; Shang, Jingzhe; Ben, Yinyin; Qin, F Xiao-Feng; Wu, Aiping

    2018-01-01

    The RNA sequencing approach has been broadly used to provide gene-, pathway-, and network-centric analyses for various cell and tissue samples. However, thus far, rich cellular information carried in tissue samples has not been thoroughly characterized from RNA-Seq data. Therefore, it would expand our horizons to better understand the biological processes of the body by incorporating a cell-centric view of tissue transcriptome. Here, a computational model named seq-ImmuCC was developed to infer the relative proportions of 10 major immune cells in mouse tissues from RNA-Seq data. The performance of seq-ImmuCC was evaluated among multiple computational algorithms, transcriptional platforms, and simulated and experimental datasets. The test results showed its stable performance and superb consistency with experimental observations under different conditions. With seq-ImmuCC, we generated the comprehensive landscape of immune cell compositions in 27 normal mouse tissues and extracted the distinct signatures of immune cell proportion among various tissue types. Furthermore, we quantitatively characterized and compared 18 different types of mouse tumor tissues of distinct cell origins with their immune cell compositions, which provided a comprehensive and informative measurement for the immune microenvironment inside tumor tissues. The online server of seq-ImmuCC are freely available at http://wap-lab.org:3200/immune/.

  3. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    PubMed

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.

  4. Multimodal RNA-seq using single-strand, double-strand, and CircLigase-based capture yields a refined and extended description of the C. elegans transcriptome.

    PubMed

    Lamm, Ayelet T; Stadler, Michael R; Zhang, Huibin; Gent, Jonathan I; Fire, Andrew Z

    2011-02-01

    We have used a combination of three high-throughput RNA capture and sequencing methods to refine and augment the transcriptome map of a well-studied genetic model, Caenorhabditis elegans. The three methods include a standard (non-directional) library preparation protocol relying on cDNA priming and foldback that has been used in several previous studies for transcriptome characterization in this species, and two directional protocols, one involving direct capture of single-stranded RNA fragments and one involving circular-template PCR (CircLigase). We find that each RNA-seq approach shows specific limitations and biases, with the application of multiple methods providing a more complete map than was obtained from any single method. Of particular note in the analysis were substantial advantages of CircLigase-based and ssRNA-based capture for defining sequences and structures of the precise 5' ends (which were lost using the double-strand cDNA capture method). Of the three methods, ssRNA capture was most effective in defining sequences to the poly(A) junction. Using data sets from a spectrum of C. elegans strains and stages and the UCSC Genome Browser, we provide a series of tools, which facilitate rapid visualization and assignment of gene structures.

  5. Comparative transcriptomic analysis provides insights into antibacterial mechanisms of Branchiostoma belcheri under Vibrio parahaemolyticus infection.

    PubMed

    Zhang, Qi-Lin; Zhu, Qian-Hua; Liang, Ming-Zhong; Wang, Feng; Guo, Jun; Deng, Xian-Yu; Chen, Jun-Yuan; Wang, Yu-Jun; Lin, Lian-Bing

    2018-05-01

    Amphioxus, a basal chordate, is widely considered to be an existing proxy of the invertebrate ancestor of vertebrates, and it exhibits susceptibility to various pathogen infections and pathogenic mimic challenges. Here, in order to understand more clearly its antibacterial mechanisms, we analyzed the ribosomal RNA (rRNA)-depleted transcriptome of Chinese amphioxus (Branchiostoma belcheri) infected with Vibrio parahaemolyticus (V. p.) via next-generation deep sequencing technology (RNA-seq). We identified a total of 3214 differentially expressed genes (DEGs) by comparing V. p.-infected and control transcriptome libraries, including 2219 significantly up-regulated and 995 significantly down-regulated DEGs in V. p.-infected amphioxus. The DEGs with the top 10 most dramatic expression fold changes after V. p. infection, as well as 53 immune-related DEGs (IRDs) belonging to four primary categories of innate immunity were analyzed further. Through gene ontology (GO) and pathway enrichment analysis, DEGs were found to be primarily related to immune processes, apoptosis, catabolic and metabolic processes, binding and enzyme activity, while pathways involving bacterial infection, immune signaling, immune response, cancer, and apoptosis were overrepresented. We validated the RNA-seq results by detecting the expression levels of 10 IRDs using qRT-PCR, and we surveyed the dynamic variation in gene expression for these IRDs at 0, 6, 12, 24, and 48 h after V. p. Subsequently, according to the RNA-seq results, the presence of a primitive Toll-like receptor (TLR)-mediated antibacterial immune signaling pathway was predicted in B. belcheri. This study provides valuable information regarding antibacterial immunity for further research into the evolution of immunity in vertebrates and broadens our understanding of the innate immune response against bacterial invasion in amphioxus. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Triterpenoid Saponin Biosynthetic Pathway Profiling and Candidate Gene Mining of the Ilex asprella Root Using RNA-Seq

    PubMed Central

    Zheng, Xiasheng; Xu, Hui; Ma, Xinye; Zhan, Ruoting; Chen, Weiwen

    2014-01-01

    Ilex asprella, which contains abundant α-amyrin type triterpenoid saponins, is an anti-influenza herbal drug widely used in south China. In this work, we first analysed the transcriptome of the I. asprella root using RNA-Seq, which provided a dataset for functional gene mining. mRNA was isolated from the total RNA of the I. asprella root and reverse-transcribed into cDNA. Then, the cDNA library was sequenced using an Illumina HiSeq™ 2000, which generated 55,028,452 clean reads. De novo assembly of these reads generated 51,865 unigenes, in which 39,269 unigenes were annotated (75.71% yield). According to the structures of the triterpenoid saponins of I. asprella, a putative biosynthetic pathway downstream of 2,3-oxidosqualene was proposed and candidate unigenes in the transcriptome data that were potentially involved in the pathway were screened using homology-based BLAST and phylogenetic analysis. Further amplification and functional analysis of these putative unigenes will provide insight into the biosynthesis of Ilex triterpenoid saponins. PMID:24722569

  7. Variant discovery in the sheepmeat odour and flavour in javanese fat tailed sheep using RNA sequencing

    NASA Astrophysics Data System (ADS)

    Abuzahra, M. A. M.; Jakaria; Listyarini, K.; Furqon, A.; Sumantri, C.; Uddin, M. J.; Gunawan, A.

    2018-05-01

    High-throughput RNA sequencing (RNA-Seq) reveals new challenges for the detection of transcriptome variants (SNPs) in different tissues and species. The aims of this study was to characterize a SNP discovery analysis in the sheep meat odour and flavour transcriptome using RNA-Seq. Six liver samples from divergent sheep meat odour and flavour were analyzed using the Illumina Genome Hiseq 2500 Analyzer. The SNP detection analysis revealed 142 SNPs in sheep meat samples, and a large number of those corresponded to differences between high and low sheep meat odour and flavour ovis genome assembly OAR v4.0. Among them, about 90.4% of genes had multiple polymorphisms within 12 genes (JAML, ANGPTL8, LOC101103463, SEPW1, SCN5A, LOC101113036, DOCK6, GTSE1, KIF12, KCTD17, KANK2, CYP2A6). Several of the SNPs (JAML, CYP2A6, SEPW1, and KIF12) found in this study could be included as suitable markers in genotyping platforms to perform association analyses in commercial populations and apply genomic selection protocols in the sheep meat production.

  8. RNA sequencing enables systematic identification of platelet transcriptomic alterations in NSCLC patients.

    PubMed

    Zhang, Qun; Hu, Huan; Liu, Hongda; Jin, Jiajia; Zhu, Peiyuan; Wang, Shujun; Shen, Kaikai; Hu, Yangbo; Li, Zhou; Zhan, Ping; Zhu, Suhua; Fan, Hang; Zhang, Jianya; Lv, Tangfeng; Song, Yong

    2018-05-29

    Platelets are implicated as key players in the metastatic dissemination of tumor cells. Previous evidence demonstrated platelets retained cytoplasmic RNAs with physiologically activity, splicing pre-mRNA to mRNA and translating into functional proteins in response to external stimulation. Recently, platelets gene profile of healthy or diseased individuals were characterized with the help of RNA sequencing (RNA-Seq) in some studies, leading to new insights into the mechanisms underlying disease pathogenesis. In this study, we performed RNA-seq in platelets from 7 healthy individuals and 15 non-small cell lung cancer (NSCLC) patients. Our data revealed a subset of near universal differently expressed gene (DEG) profiles in platelets of metastatic NSCLC compared to healthy individuals, including 626 up-regulated RNAs (mRNAs and ncRNAs) and 1497 down-regulated genes. The significant over-expressed genes showed enrichment in focal adhesion, platelets activation, gap junction and adherens junction pathways. The DEGs also included previously reported tumor-related genes such as PDGFR, VEGF, EGF, etc., verifying the consistence and significance of platelet RNA-Seq in oncology study. We also validated several up-regulated DEGs involved in tumor cell-induced platelet aggregation (TCIPA) and tumorigenesis. Additionally, transcriptomic comparison analyses of NSCLC subgroups were conducted. Between non-metastatic and metastatic NSCLC patients, 526 platelet DEGs were identified with the most altered expression. The outcomes from subgroup analysis between lung adenocarcinoma and lung squamous cell carcinoma demonstrated the diagnostic potential of platelet RNA-Seq on distinguishing tumor histological types. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  9. Transcriptome analysis of woodland strawberry (Fragaria vesca) response to the infection by Strawberry vein banding virus (SVBV).

    PubMed

    Chen, Jing; Zhang, Hanping; Feng, Mingfeng; Zuo, Dengpan; Hu, Yahui; Jiang, Tong

    2016-07-13

    Woodland strawberry (Fragaria vesca) infected with Strawberry vein banding virus (SVBV) exhibits chlorotic symptoms along the leaf veins. However, little is known about the molecular mechanism of strawberry disease caused by SVBV. We performed the next-generation sequencing (RNA-Seq) study to identify gene expression changes induced by SVBV in woodland strawberry using mock-inoculated plants as a control. Using RNA-Seq, we have identified 36,850 unigenes, of which 517 were differentially expressed in the virus-infected plants (DEGs). The unigenes were annotated and classified with Gene Ontology (GO), Clusters of Orthologous Group (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. The KEGG pathway analysis of these genes suggested that strawberry disease caused by SVBV may affect multiple processes including pigment metabolism, photosynthesis and plant-pathogen interactions. Our research provides comprehensive transcriptome information regarding SVBV infection in strawberry.

  10. A Novel Computational Strategy to Identify A-to-I RNA Editing Sites by RNA-Seq Data: De Novo Detection in Human Spinal Cord Tissue

    PubMed Central

    Picardi, Ernesto; Gallo, Angela; Galeano, Federica; Tomaselli, Sara; Pesole, Graziano

    2012-01-01

    RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual. PMID:22957051

  11. A biochemical landscape of A-to-I RNA editing in the human brain transcriptome

    PubMed Central

    Sakurai, Masayuki; Ueda, Hiroki; Yano, Takanori; Okada, Shunpei; Terajima, Hideki; Mitsuyama, Toutai; Toyoda, Atsushi; Fujiyama, Asao; Kawabata, Hitomi; Suzuki, Tsutomu

    2014-01-01

    Inosine is an abundant RNA modification in the human transcriptome and is essential for many biological processes in modulating gene expression at the post-transcriptional level. Adenosine deaminases acting on RNA (ADARs) catalyze the hydrolytic deamination of adenosines to inosines (A-to-I editing) in double-stranded regions. We previously established a biochemical method called “inosine chemical erasing” (ICE) to directly identify inosines on RNA strands with high reliability. Here, we have applied the ICE method combined with deep sequencing (ICE-seq) to conduct an unbiased genome-wide screening of A-to-I editing sites in the transcriptome of human adult brain. Taken together with the sites identified by the conventional ICE method, we mapped 19,791 novel sites and newly found 1258 edited mRNAs, including 66 novel sites in coding regions, 41 of which cause altered amino acid assignment. ICE-seq detected novel editing sites in various repeat elements as well as in short hairpins. Gene ontology analysis revealed that these edited mRNAs are associated with transcription, energy metabolism, and neurological disorders, providing new insights into various aspects of human brain functions. PMID:24407955

  12. Abiotic Stresses Modulate Landscape of Poplar Transcriptome via Alternative Splicing, Differential Intron Retention, and Isoform Ratio Switching

    PubMed Central

    Filichkin, Sergei A.; Hamilton, Michael; Dharmawardhana, Palitha D.; Singh, Sunil K.; Sullivan, Christopher; Ben-Hur, Asa; Reddy, Anireddy S. N.; Jaiswal, Pankaj

    2018-01-01

    Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress. Analysis of approximately one billion of genome-aligned RNA-Seq reads from tissue- or stress-specific libraries revealed over fifteen millions of novel splice junctions. Transcript models supported by both RNA-Seq and single molecule isoform sequencing (Iso-Seq) data revealed a broad array of novel stress- and/or tissue-specific isoforms. Analysis of Iso-Seq data also resulted in the discovery of 15,087 novel transcribed regions of which 164 show AS. Our findings demonstrate that abiotic stresses profoundly perturb transcript isoform profiles and trigger widespread intron retention (IR) events. Stress treatments often increased or decreased retention of specific introns – a phenomenon described here as differential intron retention (DIR). Many differentially retained introns were regulated in a stress- and/or tissue-specific manner. A subset of transcripts harboring super stress-responsive DIR events showed persisting fluctuations in the degree of IR across all treatments and tissue types. To investigate coordinated dynamics of intron-containing transcripts in the study we quantified absolute copy number of isoforms of two conserved transcription factors (TFs) using Droplet Digital PCR. This case study suggests that stress treatments can be associated with coordinated switches in relative ratios between fully spliced and intron-retaining isoforms and may play a role in adjusting transcriptome to abiotic stresses. PMID:29483921

  13. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  14. Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

    PubMed

    Chakraborty, Sutirtha

    2018-05-26

    RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.

  15. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE PAGES

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...

    2015-10-28

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  16. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  17. Transcriptomic signatures in cartilage ageing

    PubMed Central

    2013-01-01

    Introduction Age is an important factor in the development of osteoarthritis. Microarray studies provide insight into cartilage aging but do not reveal the full transcriptomic phenotype of chondrocytes such as small noncoding RNAs, pseudogenes, and microRNAs. RNA-Seq is a powerful technique for the interrogation of large numbers of transcripts including nonprotein coding RNAs. The aim of the study was to characterise molecular mechanisms associated with age-related changes in gene signatures. Methods RNA for gene expression analysis using RNA-Seq and real-time PCR analysis was isolated from macroscopically normal cartilage of the metacarpophalangeal joints of eight horses; four young donors (4 years old) and four old donors (>15 years old). RNA sequence libraries were prepared following ribosomal RNA depletion and sequencing was undertaken using the Illumina HiSeq 2000 platform. Differentially expressed genes were defined using Benjamini-Hochberg false discovery rate correction with a generalised linear model likelihood ratio test (P < 0.05, expression ratios ± 1.4 log2 fold-change). Ingenuity pathway analysis enabled networks, functional analyses and canonical pathways from differentially expressed genes to be determined. Results In total, the expression of 396 transcribed elements including mRNAs, small noncoding RNAs, pseudogenes, and a single microRNA was significantly different in old compared with young cartilage (± 1.4 log2 fold-change, P < 0.05). Of these, 93 were at higher levels in the older cartilage and 303 were at lower levels in the older cartilage. There was an over-representation of genes with reduced expression relating to extracellular matrix, degradative proteases, matrix synthetic enzymes, cytokines and growth factors in cartilage derived from older donors compared with young donors. In addition, there was a reduction in Wnt signalling in ageing cartilage. Conclusion There was an age-related dysregulation of matrix, anabolic and catabolic cartilage factors. This study has increased our knowledge of transcriptional networks in cartilage ageing by providing a global view of the transcriptome. PMID:23971731

  18. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

    PubMed

    Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

    2011-03-24

    The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.

  19. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing

    PubMed Central

    2011-01-01

    Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219

  20. Using RNA-seq and targeted nucleases to identify mechanisms of drug resistance in acute myeloid leukemia.

    PubMed

    Rathe, Susan K; Moriarity, Branden S; Stoltenberg, Christopher B; Kurata, Morito; Aumann, Natalie K; Rahrmann, Eric P; Bailey, Natashay J; Melrose, Ellen G; Beckmann, Dominic A; Liska, Chase R; Largaespada, David A

    2014-08-13

    The evolution from microarrays to transcriptome deep-sequencing (RNA-seq) and from RNA interference to gene knockouts using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and Transcription Activator-Like Effector Nucleases (TALENs) has provided a new experimental partnership for identifying and quantifying the effects of gene changes on drug resistance. Here we describe the results from deep-sequencing of RNA derived from two cytarabine (Ara-C) resistance acute myeloid leukemia (AML) cell lines, and present CRISPR and TALEN based methods for accomplishing complete gene knockout (KO) in AML cells. We found protein modifying loss-of-function mutations in Dck in both Ara-C resistant cell lines. CRISPR and TALEN-based KO of Dck dramatically increased the IC₅₀ of Ara-C and introduction of a DCK overexpression vector into Dck KO clones resulted in a significant increase in Ara-C sensitivity. This effort demonstrates the power of using transcriptome analysis and CRISPR/TALEN-based KOs to identify and verify genes associated with drug resistance.

  1. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    PubMed

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

    PubMed Central

    Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

    2012-01-01

    Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113

  3. RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

    PubMed

    Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

    2012-01-01

    Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.

  4. The power and promise of RNA-seq in ecology and evolution.

    PubMed

    Todd, Erica V; Black, Michael A; Gemmell, Neil J

    2016-03-01

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.

  5. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

    PubMed Central

    2010-01-01

    Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097

  6. RNA-Seq Analysis Using De Novo Transcriptome Assembly as a Reference for the Salmon Louse Caligus rogercresseyi

    PubMed Central

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions. PMID:24691066

  7. RNA-Seq analysis using de novo transcriptome assembly as a reference for the salmon louse Caligus rogercresseyi.

    PubMed

    Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo

    2014-01-01

    Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I-II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.

  8. Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes

    PubMed Central

    An, Dong; Li, Changsheng; Humbeck, Klaus

    2018-01-01

    Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292

  9. Analytical workflow profiling gene expression in murine macrophages

    PubMed Central

    Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.

    2015-01-01

    Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305

  10. Translational genomics for analysis of complex traits in peanut and sorghum

    USDA-ARS?s Scientific Manuscript database

    The integration of sequencing and genotype data from natural variation studies (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) facilitated the development of DNA markers in the form of single nucleotide polymorphic (SNP)...

  11. Integrated translational genomics for analysis of complex traits in sorghum

    USDA-ARS?s Scientific Manuscript database

    We will report on the integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of identifying genes controlling important agronomic traits and tran...

  12. Trinity: Transcriptome Assembly for Genetic and Functional Analysis of Cancer | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    The cancer transcriptome is shaped by genetic changes, variation in gene transcription, mRNA processing, editing and stability, and the cancer microbiome. Deciphering this variation and understanding its implications on tumorigenesis requires sophisticated computational analyses. Most RNA-Seq analyses rely on methods that first map short reads to a reference genome, and then compare them to annotated transcripts or assemble them. However, this strategy can be limited when the cancer genome is substantially different than the reference or for detecting sequences from the cancer microbiome.

  13. Transcriptomic Analysis of Paulownia Infected by Paulownia Witches'-Broom Phytoplasma

    PubMed Central

    Zhu, Shui-Fang; Lin, Cai-Li; Tian, Guo-Zhong; Xu, Xia; Zhao, Wen-Jun

    2013-01-01

    Phytoplasmas are plant pathogenic bacteria that have no cell wall and are responsible for major crop losses throughout the world. Phytoplasma-infected plants show a variety of symptoms and the mechanisms they use to physiologically alter the host plants are of considerable interest, but poorly understood. In this study we undertook a detailed analysis of Paulownia infected by Paulownia witches’-broom (PaWB) Phytoplasma using high-throughput mRNA sequencing (RNA-Seq) and digital gene expression (DGE). RNA-Seq analysis identified 74,831 unigenes, which were subsequently used as reference sequences for DGE analysis of diseased and healthy Paulownia in field grown and tissue cultured plants. Our study revealed that dramatic changes occurred in the gene expression profile of Paulownia after PaWB Phytoplasma infection. Genes encoding key enzymes in cytokinin biosynthesis, such as isopentenyl diphosphate isomerase and isopentenyltransferase, were significantly induced in the infected Paulownia. Genes involved in cell wall biosynthesis and degradation were largely up-regulated and genes related to photosynthesis were down-regulated after PaWB Phytoplasma infection. Our systematic analysis provides comprehensive transcriptomic data about plants infected by Phytoplasma. This information will help further our understanding of the detailed interaction mechanisms between plants and Phytoplasma. PMID:24130859

  14. Application of D-Crustacean Hyperglycemic Hormone Induces Peptidases Transcription and Suppresses Glycolysis-Related Transcripts in the Hepatopancreas of the Crayfish Pontastacus leptodactylus — Results of a Transcriptomic Study

    PubMed Central

    De Moro, Gianluca; Gerdol, Marco; Guarnaccia, Corrado; Mosco, Alessandro; Pallavicini, Alberto; Giulianini, Piero Giulio

    2013-01-01

    The crustacean Hyperglycemic Hormone (cHH) is a neuropeptide present in many decapods. Two different chiral isomers are simultaneously present in Astacid crayfish and their specific biological functions are still poorly understood. The present study is aimed at better understanding the potentially different effect of each of the isomers on the hepatopancreatic gene expression profile in the crayfish Pontastacus leptodactylus, in the context of short term hyperglycemia. Hence, two different chemically synthesized cHH enantiomers, containing either L- or D-Phe3, were injected to the circulation of intermolt females following removal of their X organ-Sinus gland complex. The effects triggered by the injection of the two alternate isomers were detected after one hour through measurement of circulating glucose levels. Triggered changes of the transcriptome expression profile in the hepatopancreas were analyzed by RNA-seq. A whole transcriptome shotgun sequence assembly provided the assumedly complete transcriptome of P. leptodactylus hepatopancreas, followed by RNA-seq analysis of changes in the expression level of many genes caused by the application of each of the hormone isomers. Circulating glucose levels were much higher in response to the D-isoform than to the L-isoform injection, one hour from injection. Similarly, the RNA-seq analysis confirmed a stronger effect on gene expression following the administration of D-cHH, while just limited alterations were caused by the L-isomer. These findings demonstrated a more prominent short term effect of the D-cHH on the transcription profile and shed light on the effect of the D-isomer on specific functional gene groups. Another contribution of the study is the construction of a de novo assembly of the hepatopancreas transcriptome, consisting of 39,935 contigs, that dramatically increases the molecular information available for this species and for crustaceans in general, providing an efficient tool for studying gene expression patterns in this organ. PMID:23840318

  15. High-Throughput Single-Cell RNA Sequencing and Data Analysis.

    PubMed

    Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic

    2018-01-01

    Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.

  16. Whole blood transcriptional profiling comparison between different milk yield of Chinese Holstein cows using RNA-seq data.

    PubMed

    Bai, Xue; Zheng, Zhuqing; Liu, Bin; Ji, Xiaoyang; Bai, Yongsheng; Zhang, Wenguang

    2016-08-22

    The objective of this research was to investigate the variation of gene expression in the blood transcriptome profile of Chinese Holstein cows associated to the milk yield traits. We used RNA-seq to generate the bovine transcriptome from the blood of 23 lactating Chinese Holstein cows with extremely high and low milk yield. A total of 100 differentially expressed genes (DEGs) (p < 0.05, FDR < 0.05) were revealed between the high and low groups. Gene ontology (GO) analysis demonstrated that the 100 DEGs were enriched in specific biological processes with regard to defense response, immune response, inflammatory response, icosanoid metabolic process, and fatty acid metabolic process (p < 0.05). The KEGG pathway analysis with 100 DEGs revealed that the most statistically-significant metabolic pathway was related with Toll-like receptor signaling pathway (p < 0.05). The expression level of four selected DEGs was analyzed by qRT-PCR, and the results indicated that the expression patterns were consistent with the deep sequencing results by RNA-Seq. Furthermore, alternative splicing analysis of 100 DEGs demonstrated that there were different splicing pattern between high and low yielders. The alternative 3' splicing site was the major splicing pattern detected in high yielders. However, in low yielders the major type was exon skipping. This study provides a non-invasive method to identify the DEGs in cattle blood using RNA-seq for milk yield. The revealed 100 DEGs between Holstein cows with extremely high and low milk yield, and immunological pathway are likely involved in milk yield trait. Finally, this study allowed us to explore associations between immune traits and production traits related to milk production.

  17. Genome-wide mapping of alternative splicing in Arabidopsis thaliana

    PubMed Central

    Filichkin, Sergei A.; Priest, Henry D.; Givan, Scott A.; Shen, Rongkun; Bryant, Douglas W.; Fox, Samuel E.; Wong, Weng-Keen; Mockler, Todd C.

    2010-01-01

    Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least ∼42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC+ isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC+ isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC+ and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression. PMID:19858364

  18. Integrated and translational genomics for analysis of complex traits in crops

    USDA-ARS?s Scientific Manuscript database

    We report here on integration of sequencing and genotype data from natural variation (by whole genome resequencing [wgs] or genotype by sequencing [gbs]), transcriptome (RNA-seq) and mutant analysis (also by wgs) with the goal of translating gems from these resources into useable DNA markers in the ...

  19. A highly efficient method for extracting next-generation sequencing quality RNA from adipose tissue of recalcitrant animal species.

    PubMed

    Sharma, Davinder; Golla, Naresh; Singh, Dheer; Onteru, Suneel K

    2018-03-01

    The next-generation sequencing (NGS) based RNA sequencing (RNA-Seq) and transcriptome profiling offers an opportunity to unveil complex biological processes. Successful RNA-Seq and transcriptome profiling requires a large amount of high-quality RNA. However, NGS-quality RNA isolation is extremely difficult from recalcitrant adipose tissue (AT) with high lipid content and low cell numbers. Further, the amount and biochemical composition of AT lipid varies depending upon the animal species which can pose different degree of resistance to RNA extraction. Currently available approaches may work effectively in one species but can be almost unproductive in another species. Herein, we report a two step protocol for the extraction of NGS quality RNA from AT across a broad range of animal species. © 2017 Wiley Periodicals, Inc.

  20. Quantification of differential gene expression by multiplexed targeted resequencing of cDNA

    PubMed Central

    Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.

    2017-01-01

    Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677

  1. Genome-wide retinal transcriptome analysis of endotoxin-induced uveitis in mice with next-generation sequencing

    PubMed Central

    Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao

    2017-01-01

    Purpose Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. Methods The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. Results At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes were validated to show similar results as the RNA-seq. Conclusions We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes occurred concurrently in EIU. These genes may play an important role in the pathogenesis of EIU. This study will lead to a better understanding of the underlying mechanisms and shed light on discovering novel therapeutic targets for ocular inflammation. PMID:28706439

  2. Genome-wide retinal transcriptome analysis of endotoxin-induced uveitis in mice with next-generation sequencing.

    PubMed

    Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao; Lei, Bo

    2017-01-01

    Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes were validated to show similar results as the RNA-seq. We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes occurred concurrently in EIU. These genes may play an important role in the pathogenesis of EIU. This study will lead to a better understanding of the underlying mechanisms and shed light on discovering novel therapeutic targets for ocular inflammation.

  3. A long and abundant non-coding RNA in Lactobacillus salivarius.

    PubMed

    Cousin, Fabien J; Lynch, Denise B; Chuat, Victoria; Bourin, Maxence J B; Casey, Pat G; Dalmasso, Marion; Harris, Hugh M B; McCann, Angela; O'Toole, Paul W

    2017-09-01

    Lactobacillus salivarius , found in the intestinal microbiota of humans and animals, is studied as an example of the sub-dominant intestinal commensals that may impart benefits upon their host. Strains typically harbour at least one megaplasmid that encodes functions contributing to contingency metabolism and environmental adaptation. RNA sequencing (RNA-seq)transcriptomic analysis of L. salivarius strain UCC118 identified the presence of a novel unusually abundant long non-coding RNA (lncRNA) encoded by the megaplasmid, and which represented more than 75 % of the total RNA-seq reads after depletion of rRNA species. The expression level of this 520 nt lncRNA in L. salivarius UCC118 exceeded that of the 16S rRNA, it accumulated during growth, was very stable over time and was also expressed during intestinal transit in a mouse. This lncRNA sequence is specific to the L. salivarius species; however, among 45 L . salivarius genomes analysed, not all (only 34) harboured the sequence for the lncRNA. This lncRNA was produced in 27 tested L. salivarius strains, but at strain-specific expression levels. High-level lncRNA expression correlated with high megaplasmid copy number. Transcriptome analysis of a deletion mutant lacking this lncRNA identified altered expression levels of genes in a number of pathways, but a definitive function of this new lncRNA was not identified. This lncRNA presents distinctive and unique properties, and suggests potential basic and applied scientific developments of this phenomenon.

  4. Transcriptome Analysis of Mango (Mangifera indica L.) Fruit Epidermal Peel to Identify Putative Cuticle-Associated Genes

    PubMed Central

    Tafolla-Arellano, Julio C.; Zheng, Yi; Sun, Honghe; Jiao, Chen; Ruiz-May, Eliel; Hernández-Oñate, Miguel A.; González-León, Alberto; Báez-Sañudo, Reginaldo; Fei, Zhangjun; Domozych, David; Rose, Jocelyn K. C.; Tiznado-Hernández, Martín E.

    2017-01-01

    Mango fruit (Mangifera indica L.) are highly perishable and have a limited shelf life, due to postharvest desiccation and senescence, which limits their global distribution. Recent studies of tomato fruit suggest that these traits are influenced by the expression of genes that are associated with cuticle metabolism. However, studies of these phenomena in mango fruit are limited by the lack of genome-scale data. In order to gain insight into the mango cuticle biogenesis and identify putative cuticle-associated genes, we analyzed the transcriptomes of peels from ripe and overripe mango fruit using RNA-Seq. Approximately 400 million reads were generated and de novo assembled into 107,744 unigenes, with a mean length of 1,717 bp and with this information an online Mango RNA-Seq Database (http://bioinfo.bti.cornell.edu/cgi-bin/mango/index.cgi) which is a valuable genomic resource for molecular research into the biology of mango fruit was created. RNA-Seq analysis suggested that the pathway leading to biosynthesis of the cuticle component, cutin, is up-regulated during overripening. This data was supported by analysis of the expression of several putative cuticle-associated genes and by gravimetric and microscopic studies of cuticle deposition, revealing a complex continuous pattern of cuticle deposition during fruit development and involving substantial accumulation during ripening/overripening. PMID:28425468

  5. Transcriptome Analysis of Mango (Mangifera indica L.) Fruit Epidermal Peel to Identify Putative Cuticle-Associated Genes.

    PubMed

    Tafolla-Arellano, Julio C; Zheng, Yi; Sun, Honghe; Jiao, Chen; Ruiz-May, Eliel; Hernández-Oñate, Miguel A; González-León, Alberto; Báez-Sañudo, Reginaldo; Fei, Zhangjun; Domozych, David; Rose, Jocelyn K C; Tiznado-Hernández, Martín E

    2017-04-20

    Mango fruit (Mangifera indica L.) are highly perishable and have a limited shelf life, due to postharvest desiccation and senescence, which limits their global distribution. Recent studies of tomato fruit suggest that these traits are influenced by the expression of genes that are associated with cuticle metabolism. However, studies of these phenomena in mango fruit are limited by the lack of genome-scale data. In order to gain insight into the mango cuticle biogenesis and identify putative cuticle-associated genes, we analyzed the transcriptomes of peels from ripe and overripe mango fruit using RNA-Seq. Approximately 400 million reads were generated and de novo assembled into 107,744 unigenes, with a mean length of 1,717 bp and with this information an online Mango RNA-Seq Database (http://bioinfo.bti.cornell.edu/cgi-bin/mango/index.cgi) which is a valuable genomic resource for molecular research into the biology of mango fruit was created. RNA-Seq analysis suggested that the pathway leading to biosynthesis of the cuticle component, cutin, is up-regulated during overripening. This data was supported by analysis of the expression of several putative cuticle-associated genes and by gravimetric and microscopic studies of cuticle deposition, revealing a complex continuous pattern of cuticle deposition during fruit development and involving substantial accumulation during ripening/overripening.

  6. Transcriptome Analysis of Mango (Mangifera indica L.) Fruit Epidermal Peel to Identify Putative Cuticle-Associated Genes

    NASA Astrophysics Data System (ADS)

    Tafolla-Arellano, Julio C.; Zheng, Yi; Sun, Honghe; Jiao, Chen; Ruiz-May, Eliel; Hernández-Oñate, Miguel A.; González-León, Alberto; Báez-Sañudo, Reginaldo; Fei, Zhangjun; Domozych, David; Rose, Jocelyn K. C.; Tiznado-Hernández, Martín E.

    2017-04-01

    Mango fruit (Mangifera indica L.) are highly perishable and have a limited shelf life, due to postharvest desiccation and senescence, which limits their global distribution. Recent studies of tomato fruit suggest that these traits are influenced by the expression of genes that are associated with cuticle metabolism. However, studies of these phenomena in mango fruit are limited by the lack of genome-scale data. In order to gain insight into the mango cuticle biogenesis and identify putative cuticle-associated genes, we analyzed the transcriptomes of peels from ripe and overripe mango fruit using RNA-Seq. Approximately 400 million reads were generated and de novo assembled into 107,744 unigenes, with a mean length of 1,717 bp and with this information an online Mango RNA-Seq Database (http://bioinfo.bti.cornell.edu/cgi-bin/mango/index.cgi) which is a valuable genomic resource for molecular research into the biology of mango fruit was created. RNA-Seq analysis suggested that the pathway leading to biosynthesis of the cuticle component, cutin, is up-regulated during overripening. This data was supported by analysis of the expression of several putative cuticle-associated genes and by gravimetric and microscopic studies of cuticle deposition, revealing a complex continuous pattern of cuticle deposition during fruit development and involving substantial accumulation during ripening/overripening.

  7. RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.

    PubMed

    Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei

    2014-07-07

    Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.

  8. Peregrine: A rapid and unbiased method to produce strand-specific RNA-Seq libraries from small quantities of starting material.

    PubMed

    Langevin, Stanley A; Bent, Zachary W; Solberg, Owen D; Curtis, Deanna J; Lane, Pamela D; Williams, Kelly P; Schoeniger, Joseph S; Sinha, Anupama; Lane, Todd W; Branda, Steven S

    2013-04-01

    Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows.

  9. Ultra-low input transcriptomics reveal the spore functional content and phylogenetic affiliations of poorly studied arbuscular mycorrhizal fungi

    PubMed Central

    Beaudet, Denis; Chen, Eric C H; Mathieu, Stephanie; Yildirir, Gokalp; Ndikumana, Steve; Dalpé, Yolande; Séguin, Sylvie; Farinelli, Laurent; Stajich, Jason E; Corradi, Nicolas

    2018-01-01

    Abstract Arbuscular mycorrhizal fungi (AMF) are a group of soil microorganisms that establish symbioses with the vast majority of land plants. To date, generation of AMF coding information has been limited to model genera that grow well axenically; Rhizoglomus and Gigaspora. Meanwhile, data on the functional gene repertoire of most AMF families is non-existent. Here, we provide primary large-scale transcriptome data from eight poorly studied AMF species (Acaulospora morrowiae, Diversispora versiforme, Scutellospora calospora, Racocetra castanea, Paraglomus brasilianum, Ambispora leptoticha, Claroideoglomus claroideum and Funneliformis mosseae) using ultra-low input ribonucleic acid (RNA)-seq approaches. Our analyses reveals that quiescent spores of many AMF species harbour a diverse functional diversity and solidify known evolutionary relationships within the group. Our findings demonstrate that RNA-seq data obtained from low-input RNA are reliable in comparison to conventional RNA-seq experiments. Thus, our methodology can potentially be used to deepen our understanding of fungal microbial function and phylogeny using minute amounts of RNA material. PMID:29211832

  10. RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    PubMed

    Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-10-01

    Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.

  11. Peregrine

    PubMed Central

    Langevin, Stanley A.; Bent, Zachary W.; Solberg, Owen D.; Curtis, Deanna J.; Lane, Pamela D.; Williams, Kelly P.; Schoeniger, Joseph S.; Sinha, Anupama; Lane, Todd W.; Branda, Steven S.

    2013-01-01

    Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows. PMID:23558773

  12. An RNA-Seq-based reference transcriptome for Citrus.

    PubMed

    Terol, Javier; Tadeo, Francisco; Ventimilla, Daniel; Talon, Manuel

    2016-03-01

    Previous RNA-Seq studies in citrus have been focused on physiological processes relevant to fruit quality and productivity of the major species, especially sweet orange. Less attention has been paid to vegetative or reproductive tissues, while most Citrus species have never been analysed. In this work, we characterized the transcriptome of vegetative and reproductive tissues from 12 Citrus species from all main phylogenetic groups. Our aims were to acquire a complete view of the citrus transcriptome landscape, to improve previous functional annotations and to obtain genetic markers associated with genes of agronomic interest. 28 samples were used for RNA-Seq analysis, obtained from 12 Citrus species: C. medica, C. aurantifolia, C. limon, C. bergamia, C. clementina, C. deliciosa, C. reshni, C. maxima, C. paradisi, C. aurantium, C. sinensis and Poncirus trifoliata. Four different organs were analysed: root, phloem, leaf and flower. A total of 3421 million Illumina reads were produced and mapped against the reference C. clementina genome sequence. Transcript discovery pipeline revealed 3326 new genes, the number of genes with alternative splicing was increased to 19,739, and a total of 73,797 transcripts were identified. Differential expression studies between the four tissues showed that gene expression is overall related to the physiological function of the specific organs above any other variable. Variants discovery analysis revealed the presence of indels and SNPs in genes associated with fruit quality and productivity. Pivotal pathways in citrus such as those of flavonoids, flavonols, ethylene and auxin were also analysed in detail. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  13. Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis

    DOE PAGES

    Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha; ...

    2015-03-16

    Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less

  14. Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha

    Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less

  15. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    PubMed

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  17. RNA-Seq-based toxicogenomic assessment of fresh frozen and formalin-fixed tissues yields similar mechanistic insights.

    PubMed

    Auerbach, Scott S; Phadke, Dhiral P; Mav, Deepak; Holmgren, Stephanie; Gao, Yuan; Xie, Bin; Shin, Joo Heon; Shah, Ruchir R; Merrick, B Alex; Tice, Raymond R

    2015-07-01

    Formalin-fixed, paraffin-embedded (FFPE) pathology specimens represent a potentially vast resource for transcriptomic-based biomarker discovery. We present here a comparison of results from a whole transcriptome RNA-Seq analysis of RNA extracted from fresh frozen and FFPE livers. The samples were derived from rats exposed to aflatoxin B1 (AFB1 ) and a corresponding set of control animals. Principal components analysis indicated that samples were separated in the two groups representing presence or absence of chemical exposure, both in fresh frozen and FFPE sample types. Sixty-five percent of the differentially expressed transcripts (AFB1 vs. controls) in fresh frozen samples were also differentially expressed in FFPE samples (overlap significance: P < 0.0001). Genomic signature and gene set analysis of AFB1 differentially expressed transcript lists indicated highly similar results between fresh frozen and FFPE at the level of chemogenomic signatures (i.e., single chemical/dose/duration elicited transcriptomic signatures), mechanistic and pathology signatures, biological processes, canonical pathways and transcription factor networks. Overall, our results suggest that similar hypotheses about the biological mechanism of toxicity would be formulated from fresh frozen and FFPE samples. These results indicate that phenotypically anchored archival specimens represent a potentially informative resource for signature-based biomarker discovery and mechanistic characterization of toxicity. Copyright © 2014 John Wiley & Sons, Ltd.

  18. Alternative splicing and trans-splicing events revealed by analysis of the Bombyx mori transcriptome

    PubMed Central

    Shao, Wei; Zhao, Qiong-Yi; Wang, Xiu-Ye; Xu, Xin-Yan; Tang, Qing; Li, Muwang; Li, Xuan; Xu, Yong-Zhen

    2012-01-01

    Alternative splicing and trans-splicing events have not been systematically studied in the silkworm Bombyx mori. Here, the silkworm transcriptome was analyzed by RNA-seq. We identified 320 novel genes, modified 1140 gene models, and found thousands of alternative splicing and 58 trans-splicing events. Studies of three SR proteins show that both their alternative splicing patterns and mRNA products are conserved from insect to human, and one isoform of Srsf6 with a retained intron is expressed sex-specifically in silkworm gonads. Trans-splicing of mod(mdg4) in silkworm was experimentally confirmed. We identified integrations from a common 5′-gene with 46 newly identified alternative 3′-exons that are located on both DNA strands over a 500-kb region. Other trans-splicing events in B. mori were predicted by bioinformatic analysis, in which 12 events were confirmed by RT-PCR, six events were further validated by chimeric SNPs, and two events were confirmed by allele-specific RT-PCR in F1 hybrids from distinct silkworm lines of JS and L10, indicating that trans-splicing is more widespread in insects than previously thought. Analysis of the B. mori transcriptome by RNA-seq provides valuable information of regulatory alternative splicing events. The conservation of splicing events across species and newly identified trans-splicing events suggest that B. mori is a good model for future studies. PMID:22627775

  19. DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.

    PubMed

    Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei

    2018-01-01

    Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

    PubMed

    Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat

    2016-12-22

    The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

  1. DBATE: database of alternative transcripts expression.

    PubMed

    Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2013-01-01

    The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.

  2. CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.

    PubMed

    Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E

    2018-05-03

    Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.

  3. PASTA: splice junction identification from RNA-Sequencing data

    PubMed Central

    2013-01-01

    Background Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy. Results Comparisons against TopHat and other splice junction prediction software on real and simulated datasets show that PASTA exhibits high specificity and sensitivity, especially at lower coverage levels. Moreover, PASTA is highly configurable and flexible, and can therefore be applied in a wide range of analysis scenarios: it is able to handle both single-end and paired-end reads, it does not rely on the presence of canonical splicing signals, and it uses organism-specific regression models to accurately identify junctions. Conclusions PASTA is a highly efficient and sensitive tool to identify splicing junctions from RNA-Seq data. Compared to similar programs, it has the ability to identify a higher number of real splicing junctions, and provides highly annotated output files containing detailed information about their location and characteristics. Accurate junction data in turn facilitates the reconstruction of the splicing isoforms and the analysis of their expression levels, which will be performed by the remaining modules of the PASTA pipeline, still under development. Use of PASTA can therefore enable the large-scale investigation of transcription and alternative splicing. PMID:23557086

  4. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    PubMed

    Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq's capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma.

  5. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application

    PubMed Central

    2015-01-01

    Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471

  6. Gel-seq: A Method for Simultaneous Sequencing Library Preparation of DNA and RNA Using Hydrogel Matrices.

    PubMed

    Hoople, Gordon D; Richards, Andrew; Wu, Yan; Pisano, Albert P; Zhang, Kun

    2018-03-26

    The ability to amplify and sequence either DNA or RNA from small starting samples has only been achieved in the last five years. Unfortunately, the standard protocols for generating genomic or transcriptomic libraries are incompatible and researchers must choose whether to sequence DNA or RNA for a particular sample. Gel-seq solves this problem by enabling researchers to simultaneously prepare libraries for both DNA and RNA starting with 100 - 1000 cells using a simple hydrogel device. This paper presents a detailed approach for the fabrication of the device as well as the biological protocol to generate paired libraries. We designed Gel-seq so that it could be easily implemented by other researchers; many genetics labs already have the necessary equipment to reproduce the Gel-seq device fabrication. Our protocol employs commonly-used kits for both whole-transcript amplification (WTA) and library preparation, which are also likely to be familiar to researchers already versed in generating genomic and transcriptomic libraries. Our approach allows researchers to bring to bear the power of both DNA and RNA sequencing on a single sample without splitting and with negligible added cost.

  7. Transcriptomic Dose-Response Analysis for Mode of Action and Risk Assessment

    EPA Science Inventory

    Microarray and RNA-seq technologies can play an important role in assessing the health risks associated with environmental exposures. The utility of gene expression data to predict hazard has been well documented. Early toxicogenomics studies used relatively high, single doses w...

  8. RNA-Seq-based transcriptome analysis of methicillin-resistant Staphylococcus aureus biofilm inhibition by ursolic acid and resveratrol

    PubMed Central

    Qin, Nan; Tan, Xiaojuan; Jiao, Yinming; Liu, Lin; Zhao, Wangsheng; Yang, Shuang; Jia, Aiqun

    2014-01-01

    Bacterial biofilms are particularly problematic since they become resistant to most available antibiotics. Hence, novel potential antagonists to inhibit biofilm formation are urgent. Here the influences of two natural products, ursolic acid and resveratrol, on biofilm of the clinical methicillin-resistant Staphylococcus aureus (MRSA) isolate were investigated using RNA-seq, and the differentially expressed genes were analyzed using Cuffdiff. The results showed that ursolic acid inhibition of biofilm formation may reduce amino acids metabolism and adhesins expression and resveratrol may disturb quorum sensing (QS) and the synthesis of surface proteins and capsular polysaccharides. In addition, the transcriptome analysis of resveratrol and the combination of resveratrol with vancomycin inhibition of established biofilm revealed that resveratrol would disturb the expression of genes related to QS, surface and secreted proteins, and capsular polysaccharides. These findings suggest that ursolic acid and resveratrol could be useful to be adjunct therapies for the treatment of MRSA biofilm-involved infections. PMID:24970710

  9. Spatial transcriptomic survey of human embryonic cerebral cortex by single-cell RNA-seq analysis.

    PubMed

    Fan, Xiaoying; Dong, Ji; Zhong, Suijuan; Wei, Yuan; Wu, Qian; Yan, Liying; Yong, Jun; Sun, Le; Wang, Xiaoye; Zhao, Yangyu; Wang, Wei; Yan, Jie; Wang, Xiaoqun; Qiao, Jie; Tang, Fuchou

    2018-06-04

    The cellular complexity of human brain development has been intensively investigated, although a regional characterization of the entire human cerebral cortex based on single-cell transcriptome analysis has not been reported. Here, we performed RNA-seq on over 4,000 individual cells from 22 brain regions of human mid-gestation embryos. We identified 29 cell sub-clusters, which showed different proportions in each region and the pons showed especially high percentage of astrocytes. Embryonic neurons were not as diverse as adult neurons, although they possessed important features of their destinies in adults. Neuron development was unsynchronized in the cerebral cortex, as dorsal regions appeared to be more mature than ventral regions at this stage. Region-specific genes were comprehensively identified in each neuronal sub-cluster, and a large proportion of these genes were neural disease related. Our results present a systematic landscape of the regionalized gene expression and neuron maturation of the human cerebral cortex.

  10. Strategy to Identify and Test Putative Light-Sensitive Non-Opsin G-Protein-Coupled Receptors: A Case Study.

    PubMed

    Faggionato, Davide; Serb, Jeanne M

    2017-08-01

    The rise of high-throughput RNA sequencing (RNA-seq) and de novo transcriptome assembly has had a transformative impact on how we identify and study genes in the phototransduction cascade of non-model organisms. But the advantage provided by the nearly automated annotation of RNA-seq transcriptomes may at the same time hinder the possibility for gene discovery and the discovery of new gene functions. For example, standard functional annotation based on domain homology to known protein families can only confirm group membership, not identify the emergence of new biochemical function. In this study, we show the importance of developing a strategy that circumvents the limitations of semiautomated annotation and apply this workflow to photosensitivity as a means to discover non-opsin photoreceptors. We hypothesize that non-opsin G-protein-coupled receptor (GPCR) proteins may have chromophore-binding lysines in locations that differ from opsin. Here, we provide the first case study describing non-opsin light-sensitive GPCRs based on tissue-specific RNA-seq data of the common bay scallop Argopecten irradians (Lamarck, 1819). Using a combination of sequence analysis and three-dimensional protein modeling, we identified two candidate proteins. We tested their photochemical properties and provide evidence showing that these two proteins incorporate 11-cis and/or all-trans retinal and react to light photochemically. Based on this case study, we demonstrate that there is potential for the discovery of new light-sensitive GPCRs, and we have developed a workflow that starts from RNA-seq assemblies to the discovery of new non-opsin, GPCR-based photopigments.

  11. Comprehensive transcriptome profiling reveals long noncoding RNA expression and alternative splicing regulation during fruit development and ripening in kiwifruit (Actinidia chinensis)

    USDA-ARS?s Scientific Manuscript database

    Genomic and transcriptomic data on kiwifruit (Actinidia chinensis) in public databases are very limited despite its nutritional and economic value. Previously, we have constructed and sequenced nine fruit RNA-Seq libraries of A. chinensis cv. 'Hongyang' at immature, mature, and postharvest ripening...

  12. Grape RNA-Seq analysis pipeline environment

    PubMed Central

    Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic

    2013-01-01

    Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413

  13. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    PubMed

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  14. RNA-Seq effectively monitors gene expression in Eutrema salsugineum plants growing in an extreme natural habitat and in controlled growth cabinet conditions

    PubMed Central

    2013-01-01

    Background The investigation of extremophile plant species growing in their natural environment offers certain advantages, chiefly that plants adapted to severe habitats have a repertoire of stress tolerance genes that are regulated to maximize plant performance under physiologically challenging conditions. Accordingly, transcriptome sequencing offers a powerful approach to address questions concerning the influence of natural habitat on the physiology of an organism. We used RNA sequencing of Eutrema salsugineum, an extremophile relative of Arabidopsis thaliana, to investigate the extent to which genetic variation and controlled versus natural environments contribute to differences between transcript profiles. Results Using 10 million cDNA reads, we compared transcriptomes from two natural Eutrema accessions (originating from Yukon Territory, Canada and Shandong Province, China) grown under controlled conditions in cabinets and those from Yukon plants collected at a Yukon field site. We assessed the genetic heterogeneity between individuals using single-nucleotide polymorphisms (SNPs) and the expression patterns of 27,016 genes. Over 39,000 SNPs distinguish the Yukon from the Shandong accessions but only 4,475 SNPs differentiated transcriptomes of Yukon field plants from an inbred Yukon line. We found 2,989 genes that were differentially expressed between the three sample groups and multivariate statistical analyses showed that transcriptomes of individual plants from a Yukon field site were as reproducible as those from inbred plants grown under controlled conditions. Predicted functions based upon gene ontology classifications show that the transcriptomes of field plants were enriched by the differential expression of light- and stress-related genes, an observation consistent with the habitat where the plants were found. Conclusion Our expectation that comparative RNA-Seq analysis of transcriptomes from plants originating in natural habitats would be confounded by uncontrolled genetic and environmental factors was not borne out. Moreover, the transcriptome data shows little genetic variation between laboratory Yukon Eutrema plants and those found at a field site. Transcriptomes were reproducible and biological associations meaningful whether plants were grown in cabinets or found in the field. Thus RNA-Seq is a valuable approach to study native plants in natural environments and this technology can be exploited to discover new gene targets for improved crop performance under adverse conditions. PMID:23984645

  15. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    PubMed

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  16. Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

    PubMed Central

    S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali

    2014-01-01

    Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic data is available at NCBI.

  17. Identification of high-confidence RNA regulatory elements by combinatorial classification of RNA-protein binding sites.

    PubMed

    Li, Yang Eric; Xiao, Mu; Shi, Binbin; Yang, Yu-Cheng T; Wang, Dong; Wang, Fei; Marcia, Marco; Lu, Zhi John

    2017-09-08

    Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. We apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP-RBP interactions in cell lines. Our approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.

  18. RNA-Seq analysis of isolate- and growth phase-specific differences in the global transcriptomes of enteropathogenic Escherichia coli prototype isolates

    PubMed Central

    Hazen, Tracy H.; Daugherty, Sean C.; Shetty, Amol; Mahurkar, Anup A.; White, Owen; Kaper, James B.; Rasko, David A.

    2015-01-01

    Enteropathogenic Escherichia coli (EPEC) are a leading cause of diarrheal illness among infants in developing countries. E. coli isolates classified as typical EPEC are identified by the presence of the locus of enterocyte effacement (LEE) and the bundle-forming pilus (BFP), and absence of the Shiga-toxin genes, while the atypical EPEC also encode LEE but do not encode BFP or Shiga-toxin. Comparative genomic analyses have demonstrated that EPEC isolates belong to diverse evolutionary lineages and possess lineage- and isolate-specific genomic content. To investigate whether this genomic diversity results in significant differences in global gene expression, we used an RNA sequencing (RNA-Seq) approach to characterize the global transcriptomes of the prototype typical EPEC isolates E2348/69, B171, C581-05, and the prototype atypical EPEC isolate E110019. The global transcriptomes were characterized during laboratory growth in two different media and three different growth phases, as well as during adherence of the EPEC isolates to human cells using in vitro tissue culture assays. Comparison of the global transcriptomes during these conditions was used to identify isolate- and growth phase-specific differences in EPEC gene expression. These analyses resulted in the identification of genes that encode proteins involved in survival and metabolism that were coordinately expressed with virulence factors. These findings demonstrate there are isolate- and growth phase-specific differences in the global transcriptomes of EPEC prototype isolates, and highlight the utility of comparative transcriptomics for identifying additional factors that are directly or indirectly involved in EPEC pathogenesis. PMID:26124752

  19. Analysis of the Bovine Monocyte-Derived Macrophage Response to Mycobacterium avium Subspecies Paratuberculosis Infection Using RNA-seq.

    PubMed

    Casey, Maura E; Meade, Kieran G; Nalpas, Nicolas C; Taraktsoglou, Maria; Browne, John A; Killick, Kate E; Park, Stephen D E; Gormley, Eamonn; Hokamp, Karsten; Magee, David A; MacHugh, David E

    2015-01-01

    Johne's disease, caused by infection with Mycobacterium avium subsp. paratuberculosis, (MAP), is a chronic intestinal disease of ruminants with serious economic consequences for cattle production in the United States and elsewhere. During infection, MAP bacilli are phagocytosed and subvert host macrophage processes, resulting in subclinical infections that can lead to immunopathology and dissemination of disease. Analysis of the host macrophage transcriptome during infection can therefore shed light on the molecular mechanisms and host-pathogen interplay associated with Johne's disease. Here, we describe results of an in vitro study of the bovine monocyte-derived macrophage (MDM) transcriptome response during MAP infection using RNA-seq. MDM were obtained from seven age- and sex-matched Holstein-Friesian cattle and were infected with MAP across a 6-h infection time course with non-infected controls. We observed 245 and 574 differentially expressed (DE) genes in MAP-infected versus non-infected control samples (adjusted P value ≤0.05) at 2 and 6 h post-infection, respectively. Functional analyses of these DE genes, including biological pathway enrichment, highlighted potential functional roles for genes that have not been previously described in the host response to infection with MAP bacilli. In addition, differential expression of pro- and anti-inflammatory cytokine genes, such as those associated with the IL-10 signaling pathway, and other immune-related genes that encode proteins involved in the bovine macrophage response to MAP infection emphasize the balance between protective host immunity and bacilli survival and proliferation. Systematic comparisons of RNA-seq gene expression results with Affymetrix(®) microarray data generated from the same experimental samples also demonstrated that RNA-seq represents a superior technology for studying host transcriptional responses to intracellular infection.

  20. Analysis of the Bovine Monocyte-Derived Macrophage Response to Mycobacterium avium Subspecies Paratuberculosis Infection Using RNA-seq

    PubMed Central

    Casey, Maura E.; Meade, Kieran G.; Nalpas, Nicolas C.; Taraktsoglou, Maria; Browne, John A.; Killick, Kate E.; Park, Stephen D. E.; Gormley, Eamonn; Hokamp, Karsten; Magee, David A.; MacHugh, David E.

    2015-01-01

    Johne’s disease, caused by infection with Mycobacterium avium subsp. paratuberculosis, (MAP), is a chronic intestinal disease of ruminants with serious economic consequences for cattle production in the United States and elsewhere. During infection, MAP bacilli are phagocytosed and subvert host macrophage processes, resulting in subclinical infections that can lead to immunopathology and dissemination of disease. Analysis of the host macrophage transcriptome during infection can therefore shed light on the molecular mechanisms and host-pathogen interplay associated with Johne’s disease. Here, we describe results of an in vitro study of the bovine monocyte-derived macrophage (MDM) transcriptome response during MAP infection using RNA-seq. MDM were obtained from seven age- and sex-matched Holstein-Friesian cattle and were infected with MAP across a 6-h infection time course with non-infected controls. We observed 245 and 574 differentially expressed (DE) genes in MAP-infected versus non-infected control samples (adjusted P value ≤0.05) at 2 and 6 h post-infection, respectively. Functional analyses of these DE genes, including biological pathway enrichment, highlighted potential functional roles for genes that have not been previously described in the host response to infection with MAP bacilli. In addition, differential expression of pro- and anti-inflammatory cytokine genes, such as those associated with the IL-10 signaling pathway, and other immune-related genes that encode proteins involved in the bovine macrophage response to MAP infection emphasize the balance between protective host immunity and bacilli survival and proliferation. Systematic comparisons of RNA-seq gene expression results with Affymetrix® microarray data generated from the same experimental samples also demonstrated that RNA-seq represents a superior technology for studying host transcriptional responses to intracellular infection. PMID:25699042

  1. GC-Content Normalization for RNA-Seq Data

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  2. Gene expression analysis of induced pluripotent stem cells from aneuploid chromosomal syndromes

    PubMed Central

    2013-01-01

    Background Human aneuploidy is the leading cause of early pregnancy loss, mental retardation, and multiple congenital anomalies. Due to the high mortality associated with aneuploidy, the pathophysiological mechanisms of aneuploidy syndrome remain largely unknown. Previous studies focused mostly on whether dosage compensation occurs, and the next generation transcriptomics sequencing technology RNA-seq is expected to eventually uncover the mechanisms of gene expression regulation and the related pathological phenotypes in human aneuploidy. Results Using next generation transcriptomics sequencing technology RNA-seq, we profiled the transcriptomes of four human aneuploid induced pluripotent stem cell (iPSC) lines generated from monosomy × (Turner syndrome), trisomy 8 (Warkany syndrome 2), trisomy 13 (Patau syndrome), and partial trisomy 11:22 (Emanuel syndrome) as well as two umbilical cord matrix iPSC lines as euploid controls to examine how phenotypic abnormalities develop with aberrant karyotype. A total of 466 M (50-bp) reads were obtained from the six iPSC lines, and over 13,000 mRNAs were identified by gene annotation. Global analysis of gene expression profiles and functional analysis of differentially expressed (DE) genes were implemented. Over 5000 DE genes are determined between aneuploidy and euploid iPSCs respectively while 9 KEGG pathways are overlapped enriched in four aneuploidy samples. Conclusions Our results demonstrate that the extra or missing chromosome has extensive effects on the whole transcriptome. Functional analysis of differentially expressed genes reveals that the genes most affected in aneuploid individuals are related to central nervous system development and tumorigenesis. PMID:24564826

  3. Transcriptomic insights on the ABC transporter gene family in the salmon louse Caligus rogercresseyi.

    PubMed

    Valenzuela-Muñoz, Valentina; Sturm, Armin; Gallardo-Escárate, Cristian

    2015-04-09

    ATP-binding cassette (ABC) protein family encode for membrane proteins involved in the transport of various biomolecules through the cellular membrane. These proteins have been identified in all taxa and present important physiological functions, including the process of insecticide detoxification in arthropods. For that reason the ectoparasite Caligus rogercresseyi represents a model species for understanding the molecular underpinnings involved in insecticide drug resistance. llumina sequencing was performed using sea lice exposed to 2 and 3 ppb of deltamethrin and azamethiphos. Contigs obtained from de novo assembly were annotated by Blastx. RNA-Seq analysis was performed and validated by qPCR analysis. From the transcriptome database of C. rogercresseyi, 57 putative members of ABC protein sequences were identified and phylogenetically classified into the eight subfamilies described for ABC transporters in arthropods. Transcriptomic profiles for ABC proteins subfamilies were evaluated throughout C. rogercresseyi development. Moreover, RNA-Seq analysis was performed for adult male and female salmon lice exposed to the delousing drugs azamethiphos and deltamethrin. High transcript levels of the ABCB and ABCC subfamilies were evidenced. Furthermore, SNPs mining was carried out for the ABC proteins sequences, revealing pivotal genomic information. The present study gives a comprehensive transcriptome analysis of ABC proteins from C. rogercresseyi, providing relevant information about transporter roles during ontogeny and in relation to delousing drug responses in salmon lice. This genomic information represents a valuable tool for pest management in the Chilean salmon aquaculture industry.

  4. Comparison of alternative approaches for analysing multi-level RNA-seq data

    PubMed Central

    Mohorianu, Irina; Bretman, Amanda; Smith, Damian T.; Fowler, Emily K.; Dalmay, Tamas

    2017-01-01

    RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. PMID:28792517

  5. Whole-transcriptome RNA-seq, gene set enrichment pathway analysis, and exon coverage analysis of two plastid RNA editing mutants.

    PubMed

    Hackett, Justin B; Lu, Yan

    2017-05-04

    In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed 2 plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the β subunit of cytochrome b 559 , an essential component of PSII. The accD gene encodes the β subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly upregulated in both orrm6 mutants. The upregulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.

  6. Transcriptome analysis by strand-specific sequencing of complementary DNA

    PubMed Central

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-01-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212

  7. Transcriptome analysis by strand-specific sequencing of complementary DNA.

    PubMed

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-10-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.

  8. Investigating the Responses of Cronobacter sakazakii to Garlic-Drived Organosulfur Compounds: a Systematic Study of Pathogenic-Bacterium Injury by Use of High-Throughput Whole-Transcriptome Sequencing and Confocal Micro-Raman Spectroscopy

    PubMed Central

    Feng, Shaolong; Eucker, Tyson P.; Holly, Mayumi K.; Konkel, Michael E.

    2014-01-01

    We present the results of a study using high-throughput whole-transcriptome sequencing (RNA-seq) and vibrational spectroscopy to characterize and fingerprint pathogenic-bacterium injury under conditions of unfavorable stress. Two garlic-derived organosulfur compounds were found to be highly effective antimicrobial compounds against Cronobacter sakazakii, a leading pathogen associated with invasive infection of infants and causing meningitis, necrotizing entercolitis, and bacteremia. RNA-seq shows changes in gene expression patterns and transcriptomic response, while confocal micro-Raman spectroscopy characterizes macromolecular changes in the bacterial cell resulting from this chemical stress. RNA-seq analyses showed that the bacterial response to ajoene differed from the response to diallyl sulfide. Specifically, ajoene caused downregulation of motility-related genes, while diallyl sulfide treatment caused an increased expression of cell wall synthesis genes. Confocal micro-Raman spectroscopy revealed that the two compounds appear to have the same phase I antimicrobial mechanism of binding to thiol-containing proteins/enzymes in bacterial cells generating a disulfide stretching band but different phase II antimicrobial mechanisms, showing alterations in the secondary structures of proteins in two different ways. Diallyl sulfide primarily altered the α-helix and β-sheet, as reflected in changes in amide I, while ajoene altered the structures containing phenylalanine and tyrosine. Bayesian probability analysis validated the ability of principal component analysis to differentiate treated and control C. sakazakii cells. Scanning electron microscopy confirmed cell injury, showing significant morphological variations in cells following treatments by these two compounds. Findings from this study aid in the development of effective intervention strategies to reduce the risk of C. sakazakii contamination in the food production environment and on food contact surfaces, reducing the risks to susceptible consumers. PMID:24271174

  9. ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

    PubMed Central

    McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

    2013-01-01

    Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943

  10. ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

    PubMed

    Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk

    2014-03-01

    RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net

  11. Global transcriptome analysis of Halolamina sp. to decipher the salt tolerance in extremely halophilic archaea.

    PubMed

    Kurt-Kızıldoğan, Aslıhan; Abanoz, Büşra; Okay, Sezer

    2017-02-15

    Extremely halophilic archaea survive in the hypersaline environments such as salt lakes or salt mines. Therefore, these microorganisms are good sources to investigate the molecular mechanisms underlying the tolerance to high salt concentrations. In this study, a global transcriptome analysis was conducted in an extremely halophilic archaeon, Halolamina sp. YKT1, isolated from a salt mine in Turkey. A comparative RNA-seq analysis was performed using YKT1 isolate grown either at 2.7M NaCl or 5.5M NaCl concentrations. A total of 2149 genes were predicted to be up-regulated and 1638 genes were down-regulated in the presence of 5.5M NaCl. The salt tolerance of Halolamina sp. YKT1 involves the up-regulation of genes related with membrane transporters, CRISPR-Cas systems, osmoprotectant solutes, oxidative stress proteins, and iron metabolism. On the other hand, the genes encoding the proteins involved in DNA replication, transcription, translation, mismatch and nucleotide excision repair were down-regulated. The RNA-seq data were verified for seven up-regulated genes as well as six down-regulated genes via qRT-PCR analysis. This comprehensive transcriptome analysis showed that the halophilic archaeon canalizes its energy towards keeping the intracellular osmotic balance minimizing the production of nucleic acids and peptides. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Analysis of differential gene expression in response to handling and confinement stress in rainbow trout using whole transcriptome RNA-seq

    USDA-ARS?s Scientific Manuscript database

    Fish under intensive rearing conditions experience various stress conditions, which have negative impacts on survival, growth and fillet quality. Understanding the molecular mechanisms underlying stress responses will facilitate improvement of animal welfare and production efficiency. Our objective ...

  13. Ultra-low input transcriptomics reveal the spore functional content and phylogenetic affiliations of poorly studied arbuscular mycorrhizal fungi.

    PubMed

    Beaudet, Denis; Chen, Eric C H; Mathieu, Stephanie; Yildirir, Gokalp; Ndikumana, Steve; Dalpé, Yolande; Séguin, Sylvie; Farinelli, Laurent; Stajich, Jason E; Corradi, Nicolas

    2017-12-02

    Arbuscular mycorrhizal fungi (AMF) are a group of soil microorganisms that establish symbioses with the vast majority of land plants. To date, generation of AMF coding information has been limited to model genera that grow well axenically; Rhizoglomus and Gigaspora. Meanwhile, data on the functional gene repertoire of most AMF families is non-existent. Here, we provide primary large-scale transcriptome data from eight poorly studied AMF species (Acaulospora morrowiae, Diversispora versiforme, Scutellospora calospora, Racocetra castanea, Paraglomus brasilianum, Ambispora leptoticha, Claroideoglomus claroideum and Funneliformis mosseae) using ultra-low input ribonucleic acid (RNA)-seq approaches. Our analyses reveals that quiescent spores of many AMF species harbour a diverse functional diversity and solidify known evolutionary relationships within the group. Our findings demonstrate that RNA-seq data obtained from low-input RNA are reliable in comparison to conventional RNA-seq experiments. Thus, our methodology can potentially be used to deepen our understanding of fungal microbial function and phylogeny using minute amounts of RNA material. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  14. Evaluation of sequencing approaches for high-throughput ...

    EPA Pesticide Factsheets

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i

  15. Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.

    PubMed

    Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi

    2013-10-07

    The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.

  16. Time-Course Transcriptome Analysis Reveals Resistance Genes of Panax ginseng Induced by Cylindrocarpon destructans Infection Using RNA-Seq.

    PubMed

    Gao, Yuan; He, Xiaoli; Wu, Bin; Long, Qiliang; Shao, Tianwei; Wang, Zi; Wei, Jianhe; Li, Yong; Ding, Wanlong

    2016-01-01

    Panax ginseng C. A. Meyer is a highly valued medicinal plant. Cylindrocarpon destructans is a destructive pathogen that causes root rot and significantly reduces the quality and yield of P. ginseng. However, an efficient method to control root rot remains unavailable because of insufficient understanding of the molecular mechanism underlying C. destructans-P. ginseng interaction. In this study, C. destructans-induced transcriptomes at different time points were investigated using RNA sequencing (RNA-Seq). De novo assembly produced 73,335 unigenes for the P. ginseng transcriptome after C. destructans infection, in which 3,839 unigenes were up-regulated. Notably, the abundance of the up-regulated unigenes sharply increased at 0.5 d postinoculation to provide effector-triggered immunity. In total, 24 of 26 randomly selected unigenes can be validated using quantitative reverse transcription (qRT)-PCR. Gene ontology enrichment analysis of these unigenes showed that "defense response to fungus", "defense response" and "response to stress" were enriched. In addition, differentially expressed transcription factors involved in the hormone signaling pathways after C. destructans infection were identified. Finally, differentially expressed unigenes involved in reactive oxygen species and ginsenoside biosynthetic pathway during C. destructans infection were indentified. To our knowledge, this study is the first to report on the dynamic transcriptome triggered by C. destructans. These results improve our understanding of disease resistance in P. ginseng and provide a useful resource for quick detection of induced markers in P. ginseng before the comprehensive outbreak of this disease caused by C. destructans.

  17. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    PubMed

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  18. VESPA: Software to Facilitate Genomic Annotation of Prokaryotic Organisms Through Integration of Proteomic and Transcriptomic Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.

    2012-04-25

    Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.

  19. Deep RNA-Seq profile reveals biodiversity, plant-microbe interactions and a large family of NBS-LRR resistance genes in walnut (Juglans regia) tissues.

    PubMed

    Chakraborty, Sandeep; Britton, Monica; Martínez-García, P J; Dandekar, Abhaya M

    2016-03-01

    Deep RNA-Seq profiling, a revolutionary method used for quantifying transcriptional levels, often includes non-specific transcripts from other co-existing organisms in spite of stringent protocols. Using the recently published walnut genome sequence as a filter, we present a broad analysis of the RNA-Seq derived transcriptome profiles obtained from twenty different tissues to extract the biodiversity and possible plant-microbe interactions in the walnut ecosystem in California. Since the residual nature of the transcripts being analyzed does not provide sufficient information to identify the exact strain, inferences made are constrained to the genus level. The presence of the pathogenic oomycete Phytophthora was detected in the root through the presence of a glyceraldehyde-3-phosphate dehydrogenase. Cryptococcus, the causal agent of cryptococcosis, was found in the catkins and vegetative buds, corroborating previous work indicating that the plant surface supported the sexual cycle of this human pathogen. The RNA-Seq profile revealed several species of the endophytic nitrogen fixing Actinobacteria. Another bacterial species implicated in aerobic biodegradation of methyl tert-butyl ether (Methylibium petroleiphilum) is also found in the root. RNA encoding proteins from the pea aphid were found in the leaves and vegetative buds, while a serine protease from mosquito with significant homology to a female reproductive tract protease from Drosophila mojavensis in the vegetative bud suggests egg-laying activities. The comprehensive analysis of RNA-seq data present also unraveled detailed, tissue-specific information of ~400 transcripts encoded by the largest family of resistance (R) genes (NBS-LRR), which possibly rationalizes the resistance of the specific walnut plant to the pathogens detected. Thus, we elucidate the biodiversity and possible plant-microbe interactions in several walnut (Juglans regia) tissues in California using deep RNA-Seq profiling.

  20. RNA-seq transcriptome analysis of formalin fixed, paraffin-embedded canine meningioma

    PubMed Central

    Grenier, Jennifer K.; Foureman, Polly A.; Sloma, Erica A.

    2017-01-01

    Meningiomas are the most commonly reported primary intracranial tumor in dogs and humans and between the two species there are similarities in histology and biologic behavior. Due to these similarities, dogs have been proposed as models for meningioma pathobiology. However, little is known about specific pathways and individual genes that are involved in the development and progression of canine meningioma. In addition, studies are lacking that utilize RNAseq to characterize gene expression in clinical cases of canine meningioma. The primary objective of this study was to develop a technique for which high quality RNA can be extracted from formalin-fixed, paraffin embedded tissue and then used for transcriptome analysis to determine patterns of gene expression. RNA was extracted from thirteen canine meningiomas–eleven from formalin fixed and two flash-frozen. These represented six grade I and seven grade II meningiomas based on the World Health Organization classification system for human meningioma. RNA was also extracted from fresh frozen leptomeninges from three control dogs for comparison. RNAseq libraries made from formalin fixed tissue were of sufficient quality to successfully identify 125 significantly differentially expressed genes, the majority of which were related to oncogenic processes. Twelve genes (AQP1, BMPER, FBLN2, FRZB, MEDAG, MYC, PAMR1, PDGFRL, PDPN, PECAM1, PERP, ZC2HC1C) were validated using qPCR. Among the differentially expressed genes were oncogenes, tumor suppressors, transcription factors, VEGF-related genes, and members of the WNT pathway. Our work demonstrates that RNA of sufficient quality can be extracted from FFPE canine meningioma samples to provide biologically relevant transcriptome analyses using a next-generation sequencing technique, such as RNA-seq. PMID:29073243

  1. Technical variations in low-input RNA-seq methodologies.

    PubMed

    Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar

    2014-01-14

    Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.

  2. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments.

    PubMed

    Petukhov, Viktor; Guo, Jimin; Baryawno, Ninib; Severe, Nicolas; Scadden, David T; Samsonova, Maria G; Kharchenko, Peter V

    2018-06-19

    Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.

  3. In Silico Functional Networks Identified in Fish Nucleated Red Blood Cells by Means of Transcriptomic and Proteomic Profiling.

    PubMed

    Puente-Marin, Sara; Nombela, Iván; Ciordia, Sergio; Mena, María Carmen; Chico, Verónica; Coll, Julio; Ortega-Villaizan, María Del Mar

    2018-04-09

    Nucleated red blood cells (RBCs) of fish have, in the last decade, been implicated in several immune-related functions, such as antiviral response, phagocytosis or cytokine-mediated signaling. RNA-sequencing (RNA-seq) and label-free shotgun proteomic analyses were carried out for in silico functional pathway profiling of rainbow trout RBCs. For RNA-seq, a de novo assembly was conducted, in order to create a transcriptome database for RBCs. For proteome profiling, we developed a proteomic method that combined: (a) fractionation into cytosolic and membrane fractions, (b) hemoglobin removal of the cytosolic fraction, (c) protein digestion, and (d) a novel step with pH reversed-phase peptide fractionation and final Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometric (LC ESI-MS/MS) analysis of each fraction. Combined transcriptome- and proteome- sequencing data identified, in silico, novel and striking immune functional networks for rainbow trout nucleated RBCs, which are mainly linked to innate and adaptive immunity. Functional pathways related to regulation of hematopoietic cell differentiation, antigen presentation via major histocompatibility complex class II (MHCII), leukocyte differentiation and regulation of leukocyte activation were identified. These preliminary findings further implicate nucleated RBCs in immune function, such as antigen presentation and leukocyte activation.

  4. In Silico Functional Networks Identified in Fish Nucleated Red Blood Cells by Means of Transcriptomic and Proteomic Profiling

    PubMed Central

    Puente-Marin, Sara; Ciordia, Sergio; Mena, María Carmen; Chico, Verónica; Coll, Julio

    2018-01-01

    Nucleated red blood cells (RBCs) of fish have, in the last decade, been implicated in several immune-related functions, such as antiviral response, phagocytosis or cytokine-mediated signaling. RNA-sequencing (RNA-seq) and label-free shotgun proteomic analyses were carried out for in silico functional pathway profiling of rainbow trout RBCs. For RNA-seq, a de novo assembly was conducted, in order to create a transcriptome database for RBCs. For proteome profiling, we developed a proteomic method that combined: (a) fractionation into cytosolic and membrane fractions, (b) hemoglobin removal of the cytosolic fraction, (c) protein digestion, and (d) a novel step with pH reversed-phase peptide fractionation and final Liquid Chromatography Electrospray Ionization Tandem Mass Spectrometric (LC ESI-MS/MS) analysis of each fraction. Combined transcriptome- and proteome- sequencing data identified, in silico, novel and striking immune functional networks for rainbow trout nucleated RBCs, which are mainly linked to innate and adaptive immunity. Functional pathways related to regulation of hematopoietic cell differentiation, antigen presentation via major histocompatibility complex class II (MHCII), leukocyte differentiation and regulation of leukocyte activation were identified. These preliminary findings further implicate nucleated RBCs in immune function, such as antigen presentation and leukocyte activation. PMID:29642539

  5. Deep Sequencing of RNA from Ancient Maize Kernels

    PubMed Central

    Rasmussen, Morten; Cappellini, Enrico; Romero-Navarro, J. Alberto; Wales, Nathan; Alquezar-Planas, David E.; Penfield, Steven; Brown, Terence A.; Vielle-Calzada, Jean-Philippe; Montiel, Rafael; Jørgensen, Tina; Odegaard, Nancy; Jacobs, Michael; Arriaza, Bernardo; Higham, Thomas F. G.; Ramsey, Christopher Bronk; Willerslev, Eske; Gilbert, M. Thomas P.

    2013-01-01

    The characterization of biomolecules from ancient samples can shed otherwise unobtainable insights into the past. Despite the fundamental role of transcriptomal change in evolution, the potential of ancient RNA remains unexploited – perhaps due to dogma associated with the fragility of RNA. We hypothesize that seeds offer a plausible refuge for long-term RNA survival, due to the fundamental role of RNA during seed germination. Using RNA-Seq on cDNA synthesized from nucleic acid extracts, we validate this hypothesis through demonstration of partial transcriptomal recovery from two sources of ancient maize kernels. The results suggest that ancient seed transcriptomics may offer a powerful new tool with which to study plant domestication. PMID:23326310

  6. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  7. Transcriptomic Studies of the Effect of nod Gene-Inducing Molecules in Rhizobia: Different Weapons, One Purpose

    PubMed Central

    Jiménez-Guerrero, Irene; Acosta-Jurado, Sebastián; Navarro-Gómez, Pilar; López-Baena, Francisco Javier; Ollero, Francisco Javier

    2017-01-01

    Simultaneous quantification of transcripts of the whole bacterial genome allows the analysis of the global transcriptional response under changing conditions. RNA-seq and microarrays are the most used techniques to measure these transcriptomic changes, and both complement each other in transcriptome profiling. In this review, we exhaustively compiled the symbiosis-related transcriptomic reports (microarrays and RNA sequencing) carried out hitherto in rhizobia. This review is specially focused on transcriptomic changes that takes place when five rhizobial species, Bradyrhizobium japonicum (=diazoefficiens) USDA 110, Rhizobium leguminosarum biovar viciae 3841, Rhizobium tropici CIAT 899, Sinorhizobium (=Ensifer) meliloti 1021 and S. fredii HH103, recognize inducing flavonoids, plant-exuded phenolic compounds that activate the biosynthesis and export of Nod factors (NF) in all analysed rhizobia. Interestingly, our global transcriptomic comparison also indicates that each rhizobial species possesses its own arsenal of molecular weapons accompanying the set of NF in order to establish a successful interaction with host legumes. PMID:29267254

  8. Comprehensive Transcriptome Study to Develop Molecular Resources of the Copepod Calanus sinicus for Their Potential Ecological Applications

    PubMed Central

    Yang, Qing; Sun, Fanyue; Yang, Zhi; Li, Hongjun

    2014-01-01

    Calanus sinicus Brodsky (Copepoda, Crustacea) is a dominant zooplanktonic species widely distributed in the margin seas of the Northwest Pacific Ocean. In this study, we utilized an RNA-Seq-based approach to develop molecular resources for C. sinicus. Adult samples were sequenced using the Illumina HiSeq 2000 platform. The sequencing data generated 69,751 contigs from 58.9 million filtered reads. The assembled contigs had an average length of 928.8 bp. Gene annotation allowed the identification of 43,417 unigene hits against the NCBI database. Gene ontology (GO) and KEGG pathway mapping analysis revealed various functional genes related to diverse biological functions and processes. Transcripts potentially involved in stress response and lipid metabolism were identified among these genes. Furthermore, 4,871 microsatellites and 110,137 single nucleotide polymorphisms (SNPs) were identified in the C. sinicus transcriptome sequences. SNP validation by the melting temperature (T m)-shift method suggested that 16 primer pairs amplified target products and showed biallelic polymorphism among 30 individuals. The present work demonstrates the power of Illumina-based RNA-Seq for the rapid development of molecular resources in nonmodel species. The validated SNP set from our study is currently being utilized in an ongoing ecological analysis to support a future study of C. sinicus population genetics. PMID:24982883

  9. An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

    PubMed

    Xu, Maoqi; Chen, Liang

    2018-01-01

    The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method.

    PubMed

    Honda, Shozo; Morichika, Keisuke; Kirino, Yohei

    2016-03-01

    RNA digestions catalyzed by many ribonucleases generate RNA fragments that contain a 2',3'-cyclic phosphate (cP) at their 3' termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named cP-RNA-seq that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3' termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment; hence, subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ∼6 d, excluding the time required for sequencing and bioinformatics analyses, which are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ∼3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5'-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes.

  11. Deciphering the Developmental Dynamics of the Mouse Liver Transcriptome

    PubMed Central

    Gunewardena, Sumedha S.; Yoo, Byunggil; Peng, Lai; Lu, Hong; Zhong, Xiaobo; Klaassen, Curtis D.; Cui, Julia Yue

    2015-01-01

    During development, liver undergoes a rapid transition from a hematopoietic organ to a major organ for drug metabolism and nutrient homeostasis. However, little is known on a transcriptome level of the genes and RNA-splicing variants that are differentially regulated with age, and which up-stream regulators orchestrate age-specific biological functions in liver. We used RNA-Seq to interrogate the developmental dynamics of the liver transcriptome in mice at 12 ages from late embryonic stage (2-days before birth) to maturity (60-days after birth). Among 21,889 unique NCBI RefSeq-annotated genes, 9,641 were significantly expressed in at least one age, 7,289 were differently regulated with age, and 859 had multiple (> = 2) RNA splicing-variants. Factor analysis showed that the dynamics of hepatic genes fall into six distinct groups based on their temporal expression. The average expression of cytokines, ion channels, kinases, phosphatases, transcription regulators and translation regulators decreased with age, whereas the average expression of peptidases, enzymes and transmembrane receptors increased with age. The average expression of growth factors peak between Day-3 and Day-10, and decrease thereafter. We identified critical biological functions, upstream regulators, and putative transcription modules that seem to govern age-specific gene expression. We also observed differential ontogenic expression of known splicing variants of certain genes, and 1,455 novel splicing isoform candidates. In conclusion, the hepatic ontogeny of the transcriptome ontogeny has unveiled critical networks and up-stream regulators that orchestrate age-specific biological functions in liver, and suggest that age contributes to the complexity of the alternative splicing landscape of the hepatic transcriptome. PMID:26496202

  12. Deciphering the Developmental Dynamics of the Mouse Liver Transcriptome.

    PubMed

    Gunewardena, Sumedha S; Yoo, Byunggil; Peng, Lai; Lu, Hong; Zhong, Xiaobo; Klaassen, Curtis D; Cui, Julia Yue

    2015-01-01

    During development, liver undergoes a rapid transition from a hematopoietic organ to a major organ for drug metabolism and nutrient homeostasis. However, little is known on a transcriptome level of the genes and RNA-splicing variants that are differentially regulated with age, and which up-stream regulators orchestrate age-specific biological functions in liver. We used RNA-Seq to interrogate the developmental dynamics of the liver transcriptome in mice at 12 ages from late embryonic stage (2-days before birth) to maturity (60-days after birth). Among 21,889 unique NCBI RefSeq-annotated genes, 9,641 were significantly expressed in at least one age, 7,289 were differently regulated with age, and 859 had multiple (> = 2) RNA splicing-variants. Factor analysis showed that the dynamics of hepatic genes fall into six distinct groups based on their temporal expression. The average expression of cytokines, ion channels, kinases, phosphatases, transcription regulators and translation regulators decreased with age, whereas the average expression of peptidases, enzymes and transmembrane receptors increased with age. The average expression of growth factors peak between Day-3 and Day-10, and decrease thereafter. We identified critical biological functions, upstream regulators, and putative transcription modules that seem to govern age-specific gene expression. We also observed differential ontogenic expression of known splicing variants of certain genes, and 1,455 novel splicing isoform candidates. In conclusion, the hepatic ontogeny of the transcriptome ontogeny has unveiled critical networks and up-stream regulators that orchestrate age-specific biological functions in liver, and suggest that age contributes to the complexity of the alternative splicing landscape of the hepatic transcriptome.

  13. Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

    PubMed

    Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent

    2017-01-01

    The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.

  14. Stallion sperm transcriptome comprises functionally coherent coding and regulatory RNAs as revealed by microarray analysis and RNA-seq.

    PubMed

    Das, Pranab J; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A Kendrick; Teague, Sheila; Love, Charles C; Varner, Dickson D; Chowdhary, Bhanu P; Raudsepp, Terje

    2013-01-01

    Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs.

  15. Stallion Sperm Transcriptome Comprises Functionally Coherent Coding and Regulatory RNAs as Revealed by Microarray Analysis and RNA-seq

    PubMed Central

    Das, Pranab J.; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A. Kendrick; Teague, Sheila; Love, Charles C.; Varner, Dickson D.; Chowdhary, Bhanu P.; Raudsepp, Terje

    2013-01-01

    Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs. PMID:23409192

  16. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data.

    PubMed

    Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai

    2012-02-15

    RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.

  17. Radiation-induced alternative transcripts as detected in total and polysome-bound mRNA.

    PubMed

    Wahba, Amy; Ryan, Michael C; Shankavaram, Uma T; Camphausen, Kevin; Tofilon, Philip J

    2018-01-02

    Alternative splicing is a critical event in the posttranscriptional regulation of gene expression. To investigate whether this process influences radiation-induced gene expression we defined the effects of ionizing radiation on the generation of alternative transcripts in total cellular mRNA (the transcriptome) and polysome-bound mRNA (the translatome) of the human glioblastoma stem-like cell line NSC11. For these studies, RNA-Seq profiles from control and irradiated cells were compared using the program SpliceSeq to identify transcripts and splice variations induced by radiation. As compared to the transcriptome (total RNA) of untreated cells, the radiation-induced transcriptome contained 92 splice events suggesting that radiation induced alternative splicing. As compared to the translatome (polysome-bound RNA) of untreated cells, the radiation-induced translatome contained 280 splice events of which only 24 were overlapping with the radiation-induced transcriptome. These results suggest that radiation not only modifies alternative splicing of precursor mRNA, but also results in the selective association of existing mRNA isoforms with polysomes. Comparison of radiation-induced alternative transcripts to radiation-induced gene expression in total RNA revealed little overlap (about 3%). In contrast, in the radiation-induced translatome, about 38% of the induced alternative transcripts corresponded to genes whose expression level was affected in the translatome. This study suggests that whereas radiation induces alternate splicing, the alternative transcripts present at the time of irradiation may play a role in the radiation-induced translational control of gene expression and thus cellular radioresponse.

  18. How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives.

    PubMed

    Dal Molin, Alessandra; Di Camillo, Barbara

    2018-01-31

    The sequencing of the transcriptome of single cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types in heterogeneous cell populations or for the study of stochastic gene expression. In recent years, various experimental methods and computational tools for analysing single-cell RNA-sequencing data have been proposed. However, most of them are tailored to different experimental designs or biological questions, and in many cases, their performance has not been benchmarked yet, thus increasing the difficulty for a researcher to choose the optimal single-cell transcriptome sequencing (scRNA-seq) experiment and analysis workflow. In this review, we aim to provide an overview of the current available experimental and computational methods developed to handle single-cell RNA-sequencing data and, based on their peculiarities, we suggest possible analysis frameworks depending on specific experimental designs. Together, we propose an evaluation of challenges and open questions and future perspectives in the field. In particular, we go through the different steps of scRNA-seq experimental protocols such as cell isolation, messenger RNA capture, reverse transcription, amplification and use of quantitative standards such as spike-ins and Unique Molecular Identifiers (UMIs). We then analyse the current methodological challenges related to preprocessing, alignment, quantification, normalization, batch effect correction and methods to control for confounding effects. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  19. Inferring Molecular Processes Heterogeneity from Transcriptional Data.

    PubMed

    Gogolewski, Krzysztof; Wronowska, Weronika; Lech, Agnieszka; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.

  20. Inferring Molecular Processes Heterogeneity from Transcriptional Data

    PubMed Central

    Wronowska, Weronika; Lesyng, Bogdan; Gambin, Anna

    2017-01-01

    RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs. PMID:29362714

  1. Single-cell transcriptional dynamics of flavivirus infection

    PubMed Central

    Bekerman, Elena

    2018-01-01

    Dengue and Zika viral infections affect millions of people annually and can be complicated by hemorrhage and shock or neurological manifestations, respectively. However, a thorough understanding of the host response to these viruses is lacking, partly because conventional approaches ignore heterogeneity in virus abundance across cells. We present viscRNA-Seq (virus-inclusive single cell RNA-Seq), an approach to probe the host transcriptome together with intracellular viral RNA at the single cell level. We applied viscRNA-Seq to monitor dengue and Zika virus infection in cultured cells and discovered extreme heterogeneity in virus abundance. We exploited this variation to identify host factors that show complex dynamics and a high degree of specificity for either virus, including proteins involved in the endoplasmic reticulum translocon, signal peptide processing, and membrane trafficking. We validated the viscRNA-Seq hits and discovered novel proviral and antiviral factors. viscRNA-Seq is a powerful approach to assess the genome-wide virus-host dynamics at single cell level. PMID:29451494

  2. RNA-Seq Profiling Reveals Novel Hepatic Gene Expression Pattern in Aflatoxin B1 Treated Rats

    PubMed Central

    Merrick, B. Alex; Phadke, Dhiral P.; Auerbach, Scott S.; Mav, Deepak; Stiegelmeyer, Suzy M.; Shah, Ruchir R.; Tice, Raymond R.

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1’s carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT’s) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq’s capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma. PMID:23630614

  3. Host Transcriptional Response to Ebola Virus Infection

    PubMed Central

    Speranza, Emily; Connor, John H

    2017-01-01

    Ebola virus disease (EVD) is a serious illness that causes severe disease in humans and non-human primates (NHPs) and has mortality rates up to 90%. EVD is caused by the Ebolavirus and currently there are no licensed therapeutics or vaccines to treat EVD. Due to its high mortality rates and potential as a bioterrorist weapon, a better understanding of the disease is of high priority. Multiparametric analysis techniques allow for a more complete understanding of a disease and the host response. Analysis of RNA species present in a sample can lead to a greater understanding of activation or suppression of different states of the immune response. Transcriptomic analyses such as microarrays and RNA-Sequencing (RNA-Seq) have been important tools to better understand the global gene expression response to EVD. In this review, we outline the current knowledge gained by transcriptomic analysis of EVD. PMID:28930167

  4. An RNA-seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants

    USDA-ARS?s Scientific Manuscript database

    Phosphorus (P) is one of the most limiting macronutrients in soils for plant growth and development. However, the whole genome molecular mechanisms contributing to plant acclimation to Pi-deficiency remains largely unknown. White lupin (Lupinus albus L.) has evolved unique adaptation systems for gro...

  5. Comparative RNA-seq analysis of transcriptome dynamics during petal development in Rosa chinensis

    PubMed Central

    Han, Yu; Wan, Huihua; Cheng, Tangren; Wang, Jia; Yang, Weiru; Pan, Huitang; Zhang, Qixiang

    2017-01-01

    The developmental process that produces the ornate petals of the China rose (Rosa chinensis) is complex and is thought to depend on the balanced expression of a functionally diverse array of genes; however, the molecular basis of rose petal development is largely unknown. Here, petal growth of the R. chinensis cultivar ‘Old Blush’ was divided into four developmental stages, and RNA-seq technology was used to analyse the dynamic changes in transcription that occur as development progresses. In total, 598 million clean reads and 61,456 successfully annotated unigenes were obtained. Differentially expressed gene (DEG) analysis comparing the transcriptomes of the developmental stages resulted in the identification of several potential candidate genes involved in petal development. DEGs involved in anthocyanin biosynthesis, petal expansion, and phytohormone pathways were considered in depth, in addition to several candidate transcription factors. These results lay a foundation for future studies on the regulatory mechanisms underlying rose petal development and may be used in molecular breeding programs aimed at generating ornamental rose lines with desirable traits. PMID:28225056

  6. Comprehensive analysis of transcriptome variation uncovers known and novel driver events in T-cell acute lymphoblastic leukemia.

    PubMed

    Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein

    2013-01-01

    RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.

  7. De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris

    PubMed Central

    Niu, Shan-Ce; Xu, Qing; Zhang, Guo-Qiang; Zhang, Yong-Qiang; Tsai, Wen-Chieh; Hsu, Jui-Ling; Liang, Chieh-Kai; Luo, Yi-Bo; Liu, Zhong-Jian

    2016-01-01

    Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search. PMID:27673730

  8. The transcriptome landscape of early maize meiosis

    USDA-ARS?s Scientific Manuscript database

    Meiosis, particularly meiotic recombination, is a major factor affecting yield and breeding of plants. To gain insight into the transcriptome landscape during early initiation steps of meiotic recombination, we profiled early prophase I meiocytes from maize using RNA-seq. Our analyses of genes prefe...

  9. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    PubMed

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  10. Methods for processing high-throughput RNA sequencing data.

    PubMed

    Ares, Manuel

    2014-11-03

    High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner. © 2014 Cold Spring Harbor Laboratory Press.

  11. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.

    PubMed

    Wolff, Alexander; Bayerlová, Michaela; Gaedcke, Jochen; Kube, Dieter; Beißbarth, Tim

    2018-01-01

    Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.

  12. Comparative analysis of the blood transcriptomes between wolves and dogs.

    PubMed

    Yang, X; Zhang, H; Shang, J; Liu, G; Xia, T; Zhao, C; Sun, G; Dou, H

    2018-06-28

    Dogs were domesticated by human and originated from wolves. Their evolutionary relationships have attracted much scientific interest due to their genetic affinity but different habitats. To identify the differences between dogs and wolves associated with domestication, we analysed the blood transcriptomes of wolves and dogs by RNA-Seq. We obtained a total of 30.87 Gb of raw reads from two dogs and three wolves using RNA-Seq technology. Comparisons of the wolf and dog transcriptomes revealed 524 genes differentially expressed genes between them. We found that some genes related to immune function (DCK, ICAM4, GAPDH and BSG) and aerobic capacity (HBA1, HBA2 and HBB) were more highly expressed in the wolf. Six differentially expressed genes related to the innate immune response (CCL23, TRIM10, DUSP10, RAB27A, CLEC5A and GCH1) were found in the wolf by a Gene Ontology enrichment analysis. Immune system development was also enriched only in the wolf group. The ALAS2, HMBS and FECH genes, shown to be enriched by the Kyoto Encyclopedia of Genes and Genomes analysis, were associated with the higher aerobic capacity and hypoxia endurance of the wolf. The results suggest that the wolf might have greater resistance to pathogens, hypoxia endurance and aerobic capacity than dogs do. © 2018 Stichting International Foundation for Animal Genetics.

  13. RNA-Seq transcriptome profiling of mouse oocytes after in vitro maturation and/or vitrification.

    PubMed

    Gao, Lei; Jia, Gongxue; Li, Ai; Ma, Haojia; Huang, Zhengyuan; Zhu, Shien; Hou, Yunpeng; Fu, Xiangwei

    2017-10-16

    In vitro maturation (IVM) and vitrification have been widely used to prepare oocytes before fertilization; however, potential effects of these procedures, such as expression profile changes, are poorly understood. In this study, mouse oocytes were divided into four groups and subjected to combinations of in vitro maturation and/or vitrification treatments. RNA-seq and in silico pathway analysis were used to identify differentially expressed genes (DEGs) that may be involved in oocyte viability after in vitro maturation and/or vitrification. Our results showed that 1) 69 genes were differentially expressed after IVM, 66 of which were up-regulated. Atp5e and Atp5o were enriched in the most significant gene ontology term "mitochondrial membrane part"; thus, these genes may be promising candidate biomarkers for oocyte viability after IVM. 2) The influence of vitrification on the transcriptome of oocytes was negligible, as no DEGs were found between vitrified and fresh oocytes. 3) The MII stage is more suitable for oocyte vitrification with respect to the transcriptome. This study provides a valuable new theoretical basis to further improve the efficiency of in vitro maturation and/or oocyte vitrification.

  14. Liver Transcriptome and miRNA Analysis of Silver Carp (Hypophthalmichthys molitrix) Intraperitoneally Injected With Microcystin-LR

    PubMed Central

    Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji

    2018-01-01

    Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738

  15. RNA-seq analysis of early enteromyxosis in turbot (Scophthalmus maximus): new insights into parasite invasion and immune evasion strategies.

    PubMed

    Ronza, Paolo; Robledo, Diego; Bermúdez, Roberto; Losada, Ana Paula; Pardo, Belén G; Sitjà-Bobadilla, Ariadna; Quiroga, María Isabel; Martínez, Paulino

    2016-07-01

    Enteromyxum scophthalmi, an intestinal myxozoan parasite, is the causative agent of a threatening disease for turbot (Scophthalmus maximus, L.) aquaculture. The colonisation of the digestive tract by this parasite leads to a cachectic syndrome associated with high morbidity and mortality rates. This myxosporidiosis has a long pre-patent period and the first detectable clinical and histopathological changes are subtle. The pathogenic mechanisms acting in the early stages of infection are still far from being fully understood. Further information on the host-parasite interaction is needed to assist in finding efficient preventive and therapeutic measures. Here, a RNA-seq-based transcriptome analysis of head kidney, spleen and pyloric caeca from experimentally-infected and control turbot was performed. Only infected fish with early signs of infection, determined by histopathology and immunohistochemical detection of E. scophthalmi, were selected. The RNA-seq analysis revealed, as expected, less intense transcriptomic changes than those previously found during later stages of the disease. Several genes involved in IFN-related pathways were up-regulated in the three organs, suggesting that the IFN-mediated immune response plays a main role in this phase of the disease. Interestingly, an opposite expression pattern had been found in a previous study on severely infected turbot. In addition, possible strategies for immune system evasion were suggested by the down-regulation of different genes encoding complement components and acute phase proteins. At the site of infection (pyloric caeca), modulation of genes related to different structural proteins was detected and the expression profile indicated the inhibition of cell proliferation and differentiation. These transcriptomic changes provide indications regarding the mechanisms of parasite attachment to and invasion of the host. The current results contribute to a better knowledge of the events that characterise the early stages of turbot enteromyxosis and provide valuable information to identify molecular markers for early detection and control of this important parasitosis. Copyright © 2016 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.

  16. RNA sequencing, de novo assembly and differential analysis of the gill transcriptome of freshwater climbing perch Anabas testudineus after six days of seawater exposure.

    PubMed

    Chen, X L; Lui, E Y; Ip, Y Kwong; Lam, S H

    2018-06-21

    To obtain transcriptomic insights into branchial responses to salinity challenge in Anabas testudineus, this study employed RNA sequencing (RNA-Seq) to analyse the gill transcriptome of A. testudineus exposed to seawater (SW) for 6 days compared with the freshwater (FW) control group. A combined FW and SW gill transcriptome was de novo assembled from 169.9 million 101 bp paired-end reads. In silico validation employing 17 A. testudineus Sanger full-length coding sequences showed that 15/17 of them had greater than 80% of their sequences aligned to the de novo assembled contigs where 5/17 had their full-length (100%) aligned and 9/17 had greater than 90% of their sequences aligned. The combined FW and SW gill transcriptome was mapped to 13780 unique human identifiers at E-value < 1.0E-20 while 952 and 886 identifiers were determined as up and down-regulated by 1.5 fold, respectively, in the gills of A. testudineus in SW when compared with FW. These genes were found to be associated with at least 23 biological processes. A larger proportion of genes encoding enzymes and transporters associated with molecular transport, energy production, metabolisms were up-regulated, while a larger proportion of genes encoding transmembrane receptors, G-protein coupled receptors, kinases and transcription regulators associated with cell cycle, growth, development, signalling, morphology and gene expression were relatively lower in the gills of A. testudineus in SW when compared with FW. High correlation (R = 0.99) was observed between RNA-Seq data and real-time quantitative PCR validation for 13 selected genes. The transcriptomic sequence information will facilitate development of molecular resources and tools while the findings will provide insights for future studies into branchial iono-osmoregulation and related cellular processes in A. testudineus. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  17. Optimizing exosomal RNA isolation for RNA-Seq analyses of archival sera specimens.

    PubMed

    Prendergast, Emily N; de Souza Fonseca, Marcos Abraão; Dezem, Felipe Segato; Lester, Jenny; Karlan, Beth Y; Noushmehr, Houtan; Lin, Xianzhi; Lawrenson, Kate

    2018-01-01

    Exosomes are endosome-derived membrane vesicles that contain proteins, lipids, and nucleic acids. The exosomal transcriptome mediates intercellular communication, and represents an understudied reservoir of novel biomarkers for human diseases. Next-generation sequencing enables complex quantitative characterization of exosomal RNAs from diverse sources. However, detailed protocols describing exosome purification for preparation of exosomal RNA-sequence (RNA-Seq) libraries are lacking. Here we compared methods for isolation of exosomes and extraction of exosomal RNA from human cell-free serum, as well as strategies for attaining equal representation of samples within pooled RNA-Seq libraries. We compared commercial precipitation with ultracentrifugation for exosome purification and confirmed the presence of exosomes via both transmission electron microscopy and immunoblotting. Exosomal RNA extraction was compared using four different RNA purification methods. We determined the minimal starting volume of serum required for exosome preparation and showed that high quality exosomal RNA can be isolated from sera stored for over a decade. Finally, RNA-Seq libraries were successfully prepared with exosomal RNAs extracted from human cell-free serum, cataloguing both coding and non-coding exosomal transcripts. This method provides researchers with strategic options to prepare RNA-Seq libraries and compare RNA-Seq data quantitatively from minimal volumes of fresh and archival human cell-free serum for disease biomarker discovery.

  18. QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

    PubMed

    Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

    2017-08-31

    As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.

  19. Transcriptome analysis of hexaploid hulless oat in response to salinity stress

    PubMed Central

    Wu, Bin; Hu, Yani; Huo, Pengjie; Zhang, Qian; Chen, Xin; Zhang, Zongwen

    2017-01-01

    Background Oat is a cereal crop of global importance used for food, feed, and forage. Understanding salinity stress tolerance mechanisms in plants is an important step towards generating crop varieties that can cope with environmental stresses. To date, little is known about the salt tolerance of oat at the molecular level. To better understand the molecular mechanisms underlying salt tolerance in oat, we investigated the transcriptomes of control and salt-treated oat using RNA-Seq. Results Using Illumina HiSeq 4000 platform, we generated 72,291,032 and 356,891,432 reads from non-stressed control and salt-stressed oat, respectively. Assembly of 64 Gb raw sequence data yielded 128,414 putative unique transcripts with an average length of 1,189 bp. Analysis of the assembled unigenes from the salt stressed and control libraries indicated that about 65,000 unigenes were differentially expressed at different stages. Functional annotation showed that ABC transporters, plant hormone signal transduction, plant-pathogen interactions, starch and sucrose metabolism, arginine and proline metabolism, and other secondary metabolite pathways were enriched under salt stress. Based on the RPKM values of assembled unigenes, 24 differentially expressed genes under salt stress were selected for quantitative RT-PCR validation, which successfully confirmed the results of RNA-Seq. Furthermore, we identified 18,039 simple sequence repeats, which may help further elucidate salt tolerance mechanisms in oat. Conclusions Our global survey of transcriptome profiles of oat plants in response to salt stress provides useful insights into the molecular mechanisms underlying salt tolerance in this crop. These findings also represent a rich resource for further analysis of salt tolerance and for breeding oat with improved salt tolerance through the use of salt-related genes. PMID:28192458

  20. Cell fixation and preservation for droplet-based single-cell transcriptomics.

    PubMed

    Alles, Jonathan; Karaiskos, Nikos; Praktiknjo, Samantha D; Grosswendt, Stefanie; Wahle, Philipp; Ruffault, Pierre-Louis; Ayoub, Salah; Schreyer, Luisa; Boltengagen, Anastasiya; Birchmeier, Carmen; Zinzen, Robert; Kocks, Christine; Rajewsky, Nikolaus

    2017-05-19

    Recent developments in droplet-based microfluidics allow the transcriptional profiling of thousands of individual cells in a quantitative, highly parallel and cost-effective way. A critical, often limiting step is the preparation of cells in an unperturbed state, not altered by stress or ageing. Other challenges are rare cells that need to be collected over several days or samples prepared at different times or locations. Here, we used chemical fixation to address these problems. Methanol fixation allowed us to stabilise and preserve dissociated cells for weeks without compromising single-cell RNA sequencing data. By using mixtures of fixed, cultured human and mouse cells, we first showed that individual transcriptomes could be confidently assigned to one of the two species. Single-cell gene expression from live and fixed samples correlated well with bulk mRNA-seq data. We then applied methanol fixation to transcriptionally profile primary cells from dissociated, complex tissues. Low RNA content cells from Drosophila embryos, as well as mouse hindbrain and cerebellum cells prepared by fluorescence-activated cell sorting, were successfully analysed after fixation, storage and single-cell droplet RNA-seq. We were able to identify diverse cell populations, including neuronal subtypes. As an additional resource, we provide 'dropbead', an R package for exploratory data analysis, visualization and filtering of Drop-seq data. We expect that the availability of a simple cell fixation method will open up many new opportunities in diverse biological contexts to analyse transcriptional dynamics at single-cell resolution.

  1. Spatial reconstruction of single-cell gene expression data.

    PubMed

    Satija, Rahul; Farrell, Jeffrey A; Gennert, David; Schier, Alexander F; Regev, Aviv

    2015-05-01

    Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.

  2. Transcriptator: An Automated Computational Pipeline to Annotate Assembled Reads and Identify Non Coding RNA.

    PubMed

    Tripathi, Kumar Parijat; Evangelista, Daniela; Zuccaro, Antonio; Guarracino, Mario Rosario

    2015-01-01

    RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: http://www-labgtp.na.icar.cnr.it/Transcriptator.

  3. Toxicity and Transcriptome Sequencing (RNA-seq) Analyses of Adult Zebrafish in Response to Exposure Carboxymethyl Cellulose Stabilized Iron Sulfide Nanoparticles.

    PubMed

    Zheng, Min; Lu, Jianguo; Zhao, Dongye

    2018-05-24

    Increasing utilization of stabilized iron sulfides (FeS) nanoparticles implies an elevated release of the materials into the environment. To understand potential impacts and underlying mechanisms of nanoparticle-induced stress, we used the transcriptome sequencing (RNA-seq) technique to characterize the transcriptomes from adult zebrafish exposed to 10 mg/L carboxymethyl cellulose (CMC) stabilized FeS nanoparticles for 96 h, demonstrating striking differences in the gene expression profiles in liver. The exposure caused significant expression alterations in genes related to immune and inflammatory responses, detoxification, oxidative stress and DNA damage/repair. The complement and coagulation cascades Kyoto encyclopedia of genes and genomes (KEGG) pathway was found significantly up-regulated under nanoparticle exposure. The quantitative real-time polymerase chain reaction using twelve genes confirmed the RNA-seq results. We identified several candidate genes commonly regulated in liver, which may serve as gene indicators when exposed to the nanoparticles. Hepatic inflammation was further confirmed by histological observation of pyknotic nuclei, and vacuole formation upon exposure. Tissue accumulation tests showed a 2.2 times higher iron concentration in the fish tissue upon exposure. This study provides preliminary mechanistic insights into potential toxic effects of organic matter stabilized FeS nanoparticles, which will improve our understanding of the genotoxicity caused by stabilized nanoparticles.

  4. Strand-specific RNA-seq analysis of the Lactobacillus delbrueckii subsp. bulgaricus transcriptome.

    PubMed

    Zheng, Huajun; Liu, Enuo; Shi, Tao; Ye, Luyi; Konno, Tomonobu; Oda, Munehiro; Ji, Zai-Si

    2016-02-01

    Lactobacillus delbrueckii subsp. bulgaricus 2038 (Lb. bulgaricus 2038) is an industrial bacterium that is used as a starter for dairy products. We proposed several hypotheses concerning its industrial features previously. Here, we utilized RNA-seq to explore the transcriptome of Lb. bulgaricus 2038 from four different growth phases under whey conditions. The most abundantly expressed genes in the four stages were mainly involved in translation (for the logarithmic stage), glycolysis (for control/lag stages), lactic acid production (all the four stages), and 10-formyl tetrahydrofolate production (for the stationary stage). The high expression of genes like d-lactate dehydrogenase was thought as a result of energy production, and consistent expression of EPS synthesis genes, the restriction-modification (RM) system and the CRISPR/Cas system were validated for explaining the advantage of this strain in yoghurt production. Several postulations, like NADPH production through GapN bypass, converting aspartate into carbon-skeleton intermediates, and formate production through degrading GTP, were proved not working under these culture conditions. The high expression of helicase genes and co-expressed amino acids/oligopeptides transporting proteins indicated that the helicase might mediate the strain obtaining nitrogen source from the environment. The transport system of Lb. bulgaricus 2038 was found to be regulated by antisense RNA, hinting the potential application of non-coding RNA in regulating lactic acid bacteria (LAB) gene expression. Our study has primarily uncovered Lb. bulgaricus 2038 transcriptome, which could gain a better understanding of the regulation system in Lb. bulgaricus and promote its industrial application.

  5. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    PubMed

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  6. RNA-seq analysis of the gonadal transcriptome during Alligator mississippiensis temperature-dependent sex determination and differentiation.

    PubMed

    Yatsu, Ryohei; Miyagawa, Shinichi; Kohno, Satomi; Parrott, Benjamin B; Yamaguchi, Katsushi; Ogino, Yukiko; Miyakawa, Hitoshi; Lowers, Russell H; Shigenobu, Shuji; Guillette, Louis J; Iguchi, Taisen

    2016-01-25

    The American alligator (Alligator mississippiensis) displays temperature-dependent sex determination (TSD), in which incubation temperature during embryonic development determines the sexual fate of the individual. However, the molecular mechanisms governing this process remain a mystery, including the influence of initial environmental temperature on the comprehensive gonadal gene expression patterns occurring during TSD. Our characterization of transcriptomes during alligator TSD allowed us to identify novel candidate genes involved in TSD initiation. High-throughput RNA sequencing (RNA-seq) was performed on gonads collected from A. mississippiensis embryos incubated at both a male and a female producing temperature (33.5 °C and 30 °C, respectively) in a time series during sexual development. RNA-seq yielded 375.2 million paired-end reads, which were mapped and assembled, and used to characterize differential gene expression. Changes in the transcriptome occurring as a function of both development and sexual differentiation were extensively profiled. Forty-one differentially expressed genes were detected in response to incubation at male producing temperature, and included genes such as Wnt signaling factor WNT11, histone demethylase KDM6B, and transcription factor C/EBPA. Furthermore, comparative analysis of development- and sex-dependent differential gene expression revealed 230 candidate genes involved in alligator sex determination and differentiation, and early details of the suspected male-fate commitment were profiled. We also discovered sexually dimorphic expression of uncharacterized ncRNAs and other novel elements, such as unique expression patterns of HEMGN and ARX. Twenty-five of the differentially expressed genes identified in our analysis were putative transcriptional regulators, among which were MYBL2, MYCL, and HOXC10, in addition to conventional sex differentiation genes such as SOX9, and FOXL2. Inferred gene regulatory network was constructed, and the gene-gene and temperature-gene interactions were predicted. Gonadal global gene expression kinetics during sex determination has been extensively profiled for the first time in a TSD species. These findings provide insights into the genetic framework underlying TSD, and expand our current understanding of the developmental fate pathways during vertebrate sex determination.

  7. HSA: a heuristic splice alignment tool.

    PubMed

    Bu, Jingde; Chi, Xuebin; Jin, Zhong

    2013-01-01

    RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.

  8. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation

    PubMed Central

    Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael

    2012-01-01

    A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795

  9. Identification and comparative analysis of differential gene expression in soybean leaf tissue under drought and flooding stress revealed by RNA-Seq

    USDA-ARS?s Scientific Manuscript database

    Soybean is the second largest crop in the US. Its yield directly impacts US agricultural economics. Drought and flooding are two major causes for soybean yield loss. To better understand their underlying molecular regulatory mechanisms, we sequenced the transcriptomes of soybean grown in drought a...

  10. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    PubMed

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  11. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization.

    PubMed

    Zhao, Shanrong; Xi, Li; Quan, Jie; Xi, Hualin; Zhang, Ying; von Schack, David; Vincent, Michael; Zhang, Baohong

    2016-01-08

    RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. The high degree of automation and interactivity in QuickRNASeq leads to a substantial reduction in the time and effort required prior to further downstream analyses and interpretation of the analyses findings. QuickRNASeq advances primary RNA-seq data analyses to the next level of automation, and is mature for public release and adoption.

  12. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

    PubMed

    Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

    2011-03-04

    Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  13. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    PubMed

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.

  15. Single-cell analysis of the transcriptome and its application in the characterization of stem cells and early embryos.

    PubMed

    Liu, Na; Liu, Lin; Pan, Xinghua

    2014-07-01

    Cellular heterogeneity within a cell population is a common phenomenon in multicellular organisms, tissues, cultured cells, and even FACS-sorted subpopulations. Important information may be masked if the cells are studied as a mass. Transcriptome profiling is a parameter that has been intensively studied, and relatively easier to address than protein composition. To understand the basis and importance of heterogeneity and stochastic aspects of the cell function and its mechanisms, it is essential to examine transcriptomes of a panel of single cells. High-throughput technologies, starting from microarrays and now RNA-seq, provide a full view of the expression of transcriptomes but are limited by the amount of RNA for analysis. Recently, several new approaches for amplification and sequencing the transcriptome of single cells or a limited low number of cells have been developed and applied. In this review, we summarize these major strategies, such as PCR-based methods, IVT-based methods, phi29-DNA polymerase-based methods, and several other methods, including their principles, characteristics, advantages, and limitations, with representative applications in cancer stem cells, early development, and embryonic stem cells. The prospects for development of future technology and application of transcriptome analysis in a single cell are also discussed.

  16. Variant discovery in the sheep milk transcriptome using RNA sequencing.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José

    2017-02-15

    The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.

  17. Unifying cancer and normal RNA sequencing data from different sources

    PubMed Central

    Wang, Qingguo; Armenia, Joshua; Zhang, Chao; Penson, Alexander V.; Reznik, Ed; Zhang, Liguo; Minet, Thais; Ochoa, Angelica; Gross, Benjamin E.; Iacobuzio-Donahue, Christine A.; Betel, Doron; Taylor, Barry S.; Gao, Jianjiong; Schultz, Nikolaus

    2018-01-01

    Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare. PMID:29664468

  18. Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

    PubMed

    van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

    2016-01-01

    RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.

  19. Protein Interaction Profile Sequencing (PIP-seq).

    PubMed

    Foley, Shawn W; Gregory, Brian D

    2016-10-10

    Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  20. Obesity modulates inflammation and lipid metabolism oocyte gene expression: A single cell transcriptome perspective

    USDA-ARS?s Scientific Manuscript database

    This study aimed to compare oocyte gene expression profiles and follicular fluid (FF) content from overweight/obese (OW) women and normal weight (NW) women who were undergoing fertility treatments. Using single cell transcriptomic analyses, we investigated oocyte gene expression using RNA-seq. Serum...

  1. Transcriptome sequencing analysis of novel sRNAs of Kineococcus radiotolerans in response to ionizing radiation.

    PubMed

    Chen, Zhouwei; Li, Lufeng; Shan, Zhan; Huang, Hannian; Chen, Huan; Ding, Xianfeng; Guo, Jiangfeng; Liu, Lili

    2016-11-01

    Kineococcus radiotolerans is a Gram-positive, radio-resistant bacterium isolated from a radioactive environment. The small noncoding RNAs (sRNAs) in bacteria are reported to play roles in the immediate response to stress and/or the recovery from stress. The analysis of K. radiotolerans transcriptome sequencing results can identify these sRNAs in a genome-wide detection, using RNA sequencing (RNA-seq) by the deep sequencing technique. In this study, the raw data of radiation-exposed samples (RS) and control samples (CS) were acquired separately from the sequencing platform. There were 217 common sRNA candidates in the two samples screened in the genome-wide scale by bioinformatics analysis. There were 43 differentially expressed sRNA candidates, including 28 up-regulated and 15 down-regulated ones. The down-regulated sRNAs were selected for the sRNA target prediction, of which 12 sRNAs that may modulate the genes related to the transcription regulation and DNA repair were considered as the candidates involved in the radio-resistance regulation system. Copyright © 2016 Elsevier GmbH. All rights reserved.

  2. RNA-Seq-based transcriptome analysis of dormant flower buds of Chinese cherry (Prunus pseudocerasus).

    PubMed

    Zhu, Youyin; Li, Yongqiang; Xin, Dedong; Chen, Wenrong; Shao, Xu; Wang, Yue; Guo, Weidong

    2015-01-25

    Bud dormancy is a critical biological process allowing Chinese cherry (Prunus pseudocerasus) to survive in winter. Due to the lake of genomic information, molecular mechanisms triggering endodormancy release in flower buds have remained unclear. Hence, we used Illumina RNA-Seq technology to carry out de novo transcriptome assembly and digital gene expression profiling of flower buds. Approximately 47million clean reads were assembled into 50,604 sequences with an average length of 837bp. A total of 37,650 unigene sequences were successfully annotated. 128 pathways were annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and metabolic, biosynthesis of second metabolite and plant hormone signal transduction accounted for higher percentage in flower bud. In critical period of endodormancy release, 1644, significantly differentially expressed genes (DEGs) were identified from expression profile. DEGs related to oxidoreductase activity were especially abundant in Gene Ontology (GO) molecular function category. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis demonstrated that DEGs were involved in various metabolic processes, including phytohormone metabolism. Quantitative real-time PCR (qRT-PCR) analysis indicated that levels of DEGs for abscisic acid and gibberellin biosynthesis decreased while the abundance of DEGs encoding their degradation enzymes increased and GID1 was down-regulated. Concomitant with endodormancy release, MADS-box transcription factors including P. pseudocerasus dormancy-associated MADS-box (PpcDAM), Agamous-like2, and APETALA3-like genes, shown remarkably epigenetic roles. The newly generated transcriptome and gene expression profiling data provide valuable genetic information for revealing transcriptomic variation during bud dormancy in Chinese cherry. The uncovered data should be useful for future studies of bud dormancy in Prunus fruit trees lacking genomic information. Copyright © 2014 Elsevier B.V. All rights reserved.

  3. Transcriptomics-based analysis using RNA-Seq of the coconut (Cocos nucifera) leaf in response to yellow decline phytoplasma infection.

    PubMed

    Nejat, Naghmeh; Cahill, David M; Vadamalai, Ganesan; Ziemann, Mark; Rookes, James; Naderali, Neda

    2015-10-01

    Invasive phytoplasmas wreak havoc on coconut palms worldwide, leading to high loss of income, food insecurity and extreme poverty of farmers in producing countries. Phytoplasmas as strictly biotrophic insect-transmitted bacterial pathogens instigate distinct changes in developmental processes and defence responses of the infected plants and manipulate plants to their own advantage; however, little is known about the cellular and molecular mechanisms underlying host-phytoplasma interactions. Further, phytoplasma-mediated transcriptional alterations in coconut palm genes have not yet been identified. This study evaluated the whole transcriptome profiles of naturally infected leaves of Cocos nucifera ecotype Malayan Red Dwarf in response to yellow decline phytoplasma from group 16SrXIV, using RNA-Seq technique. Transcriptomics-based analysis reported here identified genes involved in coconut innate immunity. The number of down-regulated genes in response to phytoplasma infection exceeded the number of genes up-regulated. Of the 39,873 differentially expressed unigenes, 21,860 unigenes were suppressed and 18,013 were induced following infection. Comparative analysis revealed that genes associated with defence signalling against biotic stimuli were significantly overexpressed in phytoplasma-infected leaves versus healthy coconut leaves. Genes involving cell rescue and defence, cellular transport, oxidative stress, hormone stimulus and metabolism, photosynthesis reduction, transcription and biosynthesis of secondary metabolites were differentially represented. Our transcriptome analysis unveiled a core set of genes associated with defence of coconut in response to phytoplasma attack, although several novel defence response candidate genes with unknown function have also been identified. This study constitutes valuable sequence resource for uncovering the resistance genes and/or susceptibility genes which can be used as genetic tools in disease resistance breeding.

  4. Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology: An Introduction to the Symposium.

    PubMed

    Mykles, Donald L; Burnett, Karen G; Durica, David S; Stillman, Jonathon H

    2016-12-01

    Crustaceans, and decapods in particular (i.e., crabs, shrimp, and lobsters), are a diverse and ecologically and commercially important group of organisms. Understanding responses to abiotic and biotic factors is critical for developing best practices in aquaculture and assessing the effects of changing environments on the biology of these important animals. A relatively small number of decapod crustacean species have been intensively studied at the molecular level; the availability, experimental tractability, and economic relevance factor into the selection of a particular species as a model. Transcriptomics, using high-throughput next generation sequencing (NGS, coupled with RNA sequencing or RNA-seq) is revolutionizing crustacean biology. The 11 symposium papers in this volume illustrate how RNA-seq is being used to study stress response, molting and limb regeneration, immunity and disease, reproduction and development, neurobiology, and ecology and evolution. This symposium occurred on the 10th anniversary of the symposium, "Genomic and Proteomic Approaches to Crustacean Biology", held at the Society for Integrative and Comparative Biology 2006 meeting. Two participants in the 2006 symposium, the late Paul Gross and David Towle, were recognized as leaders who pioneered the use of molecular techniques that would ultimately foster the transcriptomics research reviewed in this volume. RNA-seq is a powerful tool for hypothesis-driven research, as well as an engine for discovery. It has eclipsed the technologies available in 2006, such as microarrays, expressed sequence tags, and subtractive hybridization screening, as the millions of "reads" from NGS enable researchers to de novo assemble a comprehensive transcriptome without a complete genome sequence. The symposium series concludes with a policy paper that gives an overview of the resources available and makes recommendations for developing better tools for functional annotation and pathway and network analysis in organisms in which the genome is not available or is incomplete. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  5. Transcriptome and proteomic analysis of mango (Mangifera indica Linn) fruits.

    PubMed

    Wu, Hong-xia; Jia, Hui-min; Ma, Xiao-wei; Wang, Song-biao; Yao, Quan-sheng; Xu, Wen-tian; Zhou, Yi-gang; Gao, Zhong-shan; Zhan, Ru-lin

    2014-06-13

    Here we used Illumina RNA-seq technology for transcriptome sequencing of a mixed fruit sample from 'Zill' mango (Mangifera indica Linn) fruit pericarp and pulp during the development and ripening stages. RNA-seq generated 68,419,722 sequence reads that were assembled into 54,207 transcripts with a mean length of 858bp, including 26,413 clusters and 27,794 singletons. A total of 42,515(78.43%) transcripts were annotated using public protein databases, with a cut-off E-value above 10(-5), of which 35,198 and 14,619 transcripts were assigned to gene ontology terms and clusters of orthologous groups respectively. Functional annotation against the Kyoto Encyclopedia of Genes and Genomes database identified 23,741(43.79%) transcripts which were mapped to 128 pathways. These pathways revealed many previously unknown transcripts. We also applied mass spectrometry-based transcriptome data to characterize the proteome of ripe fruit. LC-MS/MS analysis of the mango fruit proteome was using tandem mass spectrometry (MS/MS) in an LTQ Orbitrap Velos (Thermo) coupled online to the HPLC. This approach enabled the identification of 7536 peptides that matched 2754 proteins. Our study provides a comprehensive sequence for a systemic view of transcriptome during mango fruit development and the most comprehensive fruit proteome to date, which are useful for further genomics research and proteomic studies. Our study provides a comprehensive sequence for a systemic view of both the transcriptome and proteome of mango fruit, and a valuable reference for further research on gene expression and protein identification. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.

  6. RNA-seq in kinetoplastids: A powerful tool for the understanding of the biology and host-pathogen interactions.

    PubMed

    Patino, Luz Helena; Ramírez, Juan David

    2017-04-01

    The kinetoplastids include a large number of parasites responsible for serious diseases in humans and animals (Leishmania and Trypanosoma brucei) considered endemic in several regions of the world. These parasites are characterized by digenetic life cycles that undergo morphological and genetic changes that allow them to adapt to different microenvironments on their vertebrates and invertebrates hosts. Recent advances in ´omics´ technology, specifically transcriptomics have allowed to reveal aspects associated with such molecular changes. So far, different techniques have been used to evaluate the gene expression profile during the various stages of the life cycle of these parasites and during the host-parasite interactions. However, some of them have serious drawbacks that limit the precise study and full understanding of their transcriptomes. Therefore, recently has been implemented the latest technology (RNA-seq), which overcomes the drawbacks of traditional methods. In this review, studies that so far have used RNA-seq are presented and allowed to expand our knowledge regarding the biology of these parasites and their interactions with their hosts. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. DiffSplice: the genome-wide detection of differential splicing events with RNA-seq

    PubMed Central

    Hu, Yin; Huang, Yan; Du, Ying; Orellana, Christian F.; Singh, Darshan; Johnson, Amy R.; Monroy, Anaïs; Kuan, Pei-Fen; Hammond, Scott M.; Makowski, Liza; Randell, Scott H.; Chiang, Derek Y.; Hayes, D. Neil; Jones, Corbin; Liu, Yufeng; Prins, Jan F.; Liu, Jinze

    2013-01-01

    The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice. PMID:23155066

  8. RNA-seq analysis of developing pecan (Carya illinoinensis) embryos reveals parallel expression patterns among allergen and lipid metabolism genes

    USDA-ARS?s Scientific Manuscript database

    Pecan nuts and other tree nuts can be a nutrient rich part of a healthy diet full of beneficial fatty acids and antioxidants, but can also cause allergic reactions in people suffering from food allergy to the nuts. We characterized the transcriptome of a developing pecan nut to identify the gene ex...

  9. Meta-analysis of RNA-Seq data across cohorts in a multi-season feed efficiency study of crossbred beef steers accounts for biological and technical variability within season

    USDA-ARS?s Scientific Manuscript database

    High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...

  10. Transcriptome analysis using next generation sequencing reveals molecular signatures of diabetic retinopathy and efficacy of candidate drugs.

    PubMed

    Kandpal, Raj P; Rajasimha, Harsha K; Brooks, Matthew J; Nellissery, Jacob; Wan, Jun; Qian, Jiang; Kern, Timothy S; Swaroop, Anand

    2012-01-01

    To define gene expression changes associated with diabetic retinopathy in a mouse model using next generation sequencing, and to utilize transcriptome signatures to assess molecular pathways by which pharmacological agents inhibit diabetic retinopathy. We applied a high throughput RNA sequencing (RNA-seq) strategy using Illumina GAIIx to characterize the entire retinal transcriptome from nondiabetic and from streptozotocin-treated mice 32 weeks after induction of diabetes. Some of the diabetic mice were treated with inhibitors of receptor for advanced glycation endproducts (RAGE) and p38 mitogen activated protein (MAP) kinase, which have previously been shown to inhibit diabetic retinopathy in rodent models. The transcripts and alternatively spliced variants were determined in all experimental groups. Next generation sequencing-based RNA-seq profiles provided comprehensive signatures of transcripts that are altered in early stages of diabetic retinopathy. These transcripts encoded proteins involved in distinct yet physiologically relevant disease-associated pathways such as inflammation, microvasculature formation, apoptosis, glucose metabolism, Wnt signaling, xenobiotic metabolism, and photoreceptor biology. Significant upregulation of crystallin transcripts was observed in diabetic animals, and the diabetes-induced upregulation of these transcripts was inhibited in diabetic animals treated with inhibitors of either RAGE or p38 MAP kinase. These two therapies also showed dissimilar regulation of some subsets of transcripts that included alternatively spliced versions of arrestin, neutral sphingomyelinase activation associated factor (Nsmaf), SH3-domain GRB2-like interacting protein 1 (Sgip1), and axin. Diabetes alters many transcripts in the retina, and two therapies that inhibit the vascular pathology similarly inhibit a portion of these changes, pointing to possible molecular mechanisms for their beneficial effects. These therapies also changed the abundance of various alternatively spliced versions of signaling transcripts, suggesting a possible role of alternative splicing in disease etiology. Our studies clearly demonstrate RNA-seq as a comprehensive strategy for identifying disease-specific transcripts, and for determining comparative profiles of molecular changes mediated by candidate drugs.

  11. RNA-seq Analysis of Early Hepatic Response to Handling and Confinement Stress in Rainbow Trout

    PubMed Central

    Liu, Sixin; Gao, Guangtu; Palti, Yniv; Cleveland, Beth M.; Weber, Gregory M.; Rexroad, Caird E.

    2014-01-01

    Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs). Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R2 = 0.88). Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database. Accession Numbers Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881. Customized Perl Scripts All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author. PMID:24558395

  12. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    PubMed Central

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

  13. Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences

    PubMed Central

    2013-01-01

    Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002

  14. Selective amplification and sequencing of cyclic phosphate-containing RNAs by the cP-RNA-seq method

    PubMed Central

    Honda, Shozo; Morichika, Keisuke; Kirino, Yohei

    2016-01-01

    RNA digestions catalyzed by many ribonucleases generate RNA fragments containing a 2′,3′-cyclic phosphate (cP) at their 3′-termini. However, standard RNA-seq methods are unable to accurately capture cP-containing RNAs because the cP inhibits the adapter ligation reaction. We recently developed a method named “cP-RNA-seq” that is able to selectively amplify and sequence cP-containing RNAs. Here we describe the cP-RNA-seq protocol in which the 3′-termini of all RNAs, except those containing a cP, are cleaved through a periodate treatment after phosphatase treatment, hence subsequent adapter ligation and cDNA amplification steps are exclusively applied to cP-containing RNAs. cP-RNA-seq takes ~6 d, excluding the time required for sequencing and bioinformatics analyses, such downstream assays are not covered in detail in this protocol. Biochemical validation of the existence of cP in the identified RNAs takes ~3 d. Even though the cP-RNA-seq method was developed to identify angiogenin-generating 5′-tRNA halves as a proof of principle, the method should be applicable to global identification of cP-containing RNA repertoires in various transcriptomes. PMID:26866791

  15. RNASeq-based genome annotation and identification of long-noncoding RNAs in the grapevine cultivar 'Riesling'

    USDA-ARS?s Scientific Manuscript database

    The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in heterozygous species. This is a promising approach to improving the annotation of the reference genome sequence of grapevine (Vitis vinifera L.), a species of high-l...

  16. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq

    PubMed Central

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species. PMID:26177103

  17. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    PubMed

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species.

  18. Mango (Mangifera indica L.) cv. Kent fruit mesocarp de novo transcriptome assembly identifies gene families important for ripening

    PubMed Central

    Dautt-Castro, Mitzuko; Ochoa-Leyva, Adrian; Contreras-Vergara, Carmen A.; Pacheco-Sanchez, Magda A.; Casas-Flores, Sergio; Sanchez-Flores, Alejandro; Kuhn, David N.; Islas-Osuna, Maria A.

    2015-01-01

    Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. “Kent” was done to identify key genes associated with fruit ripening. Using the Illumina sequencing platform, 67,682,269 clean reads were obtained and a transcriptome of 4.8 Gb. A total of 33,142 coding sequences were predicted and after functional annotation, 25,154 protein sequences were assigned with a product according to Swiss-Prot database and 32,560 according to non-redundant database. Differential expression analysis identified 2,306 genes with significant differences in expression between mature-green and ripe mango [1,178 up-regulated and 1,128 down-regulated (FDR ≤ 0.05)]. The expression of 10 genes evaluated by both qRT-PCR and RNA-seq data was highly correlated (R = 0.97), validating the differential expression data from RNA-seq alone. Gene Ontology enrichment analysis, showed significantly represented terms associated to fruit ripening like “cell wall,” “carbohydrate catabolic process” and “starch and sucrose metabolic process” among others. Mango genes were assigned to 327 metabolic pathways according to Kyoto Encyclopedia of Genes and Genomes database, among them those involved in fruit ripening such as plant hormone signal transduction, starch and sucrose metabolism, galactose metabolism, terpenoid backbone, and carotenoid biosynthesis. This study provides a mango transcriptome that will be very helpful to identify genes for expression studies in early and late flowering mangos during fruit ripening. PMID:25741352

  19. Mango (Mangifera indica L.) cv. Kent fruit mesocarp de novo transcriptome assembly identifies gene families important for ripening.

    PubMed

    Dautt-Castro, Mitzuko; Ochoa-Leyva, Adrian; Contreras-Vergara, Carmen A; Pacheco-Sanchez, Magda A; Casas-Flores, Sergio; Sanchez-Flores, Alejandro; Kuhn, David N; Islas-Osuna, Maria A

    2015-01-01

    Fruit ripening is a physiological and biochemical process genetically programmed to regulate fruit quality parameters like firmness, flavor, odor and color, as well as production of ethylene in climacteric fruit. In this study, a transcriptomic analysis of mango (Mangifera indica L.) mesocarp cv. "Kent" was done to identify key genes associated with fruit ripening. Using the Illumina sequencing platform, 67,682,269 clean reads were obtained and a transcriptome of 4.8 Gb. A total of 33,142 coding sequences were predicted and after functional annotation, 25,154 protein sequences were assigned with a product according to Swiss-Prot database and 32,560 according to non-redundant database. Differential expression analysis identified 2,306 genes with significant differences in expression between mature-green and ripe mango [1,178 up-regulated and 1,128 down-regulated (FDR ≤ 0.05)]. The expression of 10 genes evaluated by both qRT-PCR and RNA-seq data was highly correlated (R = 0.97), validating the differential expression data from RNA-seq alone. Gene Ontology enrichment analysis, showed significantly represented terms associated to fruit ripening like "cell wall," "carbohydrate catabolic process" and "starch and sucrose metabolic process" among others. Mango genes were assigned to 327 metabolic pathways according to Kyoto Encyclopedia of Genes and Genomes database, among them those involved in fruit ripening such as plant hormone signal transduction, starch and sucrose metabolism, galactose metabolism, terpenoid backbone, and carotenoid biosynthesis. This study provides a mango transcriptome that will be very helpful to identify genes for expression studies in early and late flowering mangos during fruit ripening.

  20. Transcriptome-Wide Analysis of Hepatitis B Virus-Mediated Changes to Normal Hepatocyte Gene Expression.

    PubMed

    Lamontagne, Jason; Mell, Joshua C; Bouchard, Michael J

    2016-02-01

    Globally, a chronic hepatitis B virus (HBV) infection remains the leading cause of primary liver cancer. The mechanisms leading to the development of HBV-associated liver cancer remain incompletely understood. In part, this is because studies have been limited by the lack of effective model systems that are both readily available and mimic the cellular environment of a normal hepatocyte. Additionally, many studies have focused on single, specific factors or pathways that may be affected by HBV, without addressing cell physiology as a whole. Here, we apply RNA-seq technology to investigate transcriptome-wide, HBV-mediated changes in gene expression to identify single factors and pathways as well as networks of genes and pathways that are affected in the context of HBV replication. Importantly, these studies were conducted in an ex vivo model of cultured primary hepatocytes, allowing for the transcriptomic characterization of this model system and an investigation of early HBV-mediated effects in a biologically relevant context. We analyzed differential gene expression within the context of time-mediated gene-expression changes and show that in the context of HBV replication a number of genes and cellular pathways are altered, including those associated with metabolism, cell cycle regulation, and lipid biosynthesis. Multiple analysis pipelines, as well as qRT-PCR and an independent, replicate RNA-seq analysis, were used to identify and confirm differentially expressed genes. HBV-mediated alterations to the transcriptome that we identified likely represent early changes to hepatocytes following an HBV infection, suggesting potential targets for early therapeutic intervention. Overall, these studies have produced a valuable resource that can be used to expand our understanding of the complex network of host-virus interactions and the impact of HBV-mediated changes to normal hepatocyte physiology on viral replication.

  1. PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data

    PubMed Central

    Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A. C.; Ning, Zemin; Slagboom, P. Eline; Ye, Kai

    2012-01-01

    Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22219203

  2. Transcriptome analysis of differentially expressed genes involved in selenium accumulation in tea plant (Camellia sinensis)

    PubMed Central

    Liu, Yanli; Ma, Linlong; Jin, Xiaofang; Guo, Guiyi; Tan, Rongrong; Liu, Zheng; Zheng, Lin; Ye, Fei; Liu, Wei

    2018-01-01

    Tea plant (Camellia sinensis) has strong enrichment ability for selenium (Se). Selenite is the main form of Se absorbed and utilized by tea plant. However, the mechanism of selenite absorption and accumulation in tea plant is still unknown. In this study, RNA sequencing (RNA-seq) was used to perform transcriptomic analysis on the molecular mechanism of selenite absorption and accumulation in tea plant. 397.98 million high-quality reads were obtained and assembled into 168,212 unigenes, 89,605 of which were extensively annotated. There were 60,582 and 1,362 differentially expressed genes (DEGs) in roots and leaves, respectively. RNA-seq results were further validated by quantitative RT-PCR. Based on GO terms, the unigenes were mainly involved in cell, binding and metabolic process. KEGG pathway enrichment analysis showed that predominant pathways included ribosome and protein processing in endoplasmic reticulum. Further analysis revealed that sulfur metabolism, glutathione metabolism, selenocompound metabolism and plant hormone signal transduction responded to selenite in tea plant. Additionally, a large number of genes of higher expressions associated with phosphate transporters, sulfur assimilation, antioxidant enzymes, antioxidant substances and responses to ethylene and jasmonic acid were identified. Stress-related plant hormones might play a signaling role in promoting sulfate/selenite uptake and assimilation in tea plant. Moreover, some other Se accumulation mechanisms of tea plant were found. Our study provides a possibility for controlling Se accumulation in tea plant through bio-technologies and will be helpful for breeding new tea cultivars. PMID:29856771

  3. Transcription start site associated RNAs (TSSaRNAs) are ubiquitous in all domains of life.

    PubMed

    Zaramela, Livia S; Vêncio, Ricardo Z N; ten-Caten, Felipe; Baliga, Nitin S; Koide, Tie

    2014-01-01

    A plethora of non-coding RNAs has been discovered using high-resolution transcriptomics tools, indicating that transcriptional and post-transcriptional regulation is much more complex than previously appreciated. Small RNAs associated with transcription start sites of annotated coding regions (TSSaRNAs) are pervasive in both eukaryotes and bacteria. Here, we provide evidence for existence of TSSaRNAs in several archaeal transcriptomes including: Halobacterium salinarum, Pyrococcus furiosus, Methanococcus maripaludis, and Sulfolobus solfataricus. We validated TSSaRNAs from the model archaeon Halobacterium salinarum NRC-1 by deep sequencing two independent small-RNA enriched (RNA-seq) and a primary-transcript enriched (dRNA-seq) strand-specific libraries. We identified 652 transcripts, of which 179 were shown to be primary transcripts (∼7% of the annotated genome). Distinct growth-associated expression patterns between TSSaRNAs and their cognate genes were observed, indicating a possible role in environmental responses that may result from RNA polymerase with varying pausing rhythms. This work shows that TSSaRNAs are ubiquitous across all domains of life.

  4. Targeted reconstruction of T cell receptor sequence from single cell RNA-seq links CDR3 length to T cell differentiation state

    PubMed Central

    Yates, Kathleen B.; Bi, Kevin; Darko, Samuel; Godec, Jernej; Gerdemann, Ulrike; Swadling, Leo; Douek, Daniel C.; Klenerman, Paul; Barnes, Eleanor J.; Sharpe, Arlene H.

    2017-01-01

    Abstract The T cell compartment must contain diversity in both T cell receptor (TCR) repertoire and cell state to provide effective immunity against pathogens. However, it remains unclear how differences in the TCR contribute to heterogeneity in T cell state. Single cell RNA-sequencing (scRNA-seq) can allow simultaneous measurement of TCR sequence and global transcriptional profile from single cells. However, current methods for TCR inference from scRNA-seq are limited in their sensitivity and require long sequencing reads, thus increasing the cost and decreasing the number of cells that can be feasibly analyzed. Here we present TRAPeS, a publicly available tool that can efficiently extract TCR sequence information from short-read scRNA-seq libraries. We apply it to investigate heterogeneity in the CD8+ T cell response in humans and mice, and show that it is accurate and more sensitive than existing approaches. Coupling TRAPeS with transcriptome analysis of CD8+ T cells specific for a single epitope from Yellow Fever Virus (YFV), we show that the recently described ‘naive-like’ memory population have significantly longer CDR3 regions and greater divergence from germline sequence than do effector-memory phenotype cells. This suggests that TCR usage is associated with the differentiation state of the CD8+ T cell response to YFV. PMID:28934479

  5. Obese rats supplemented with bitter melon display marked shifts in the expression of genes controlling inflammatory response and lipid metabolism by RNA-Seq analysis of colonic mucosa.

    PubMed

    Bai, Juan; Zhu, Ying; Dong, Ying

    2018-06-01

    Obesity is known to induce pathological changes in the gut and diets rich in complex carbohydrates that resist digestion in the small bowel can alter large bowel ecology. The purposes of this study were to identify the effects of bitter melon powder (BMP) on the global gene expression pattern in the colon mucosa of obese rats. Obese rats were fed a high-fat diet and treated without or with BMP for 8 weeks. Genome-wide expression profiles of the colon mucosa were determined by RNA sequencing (RNA-Seq) analysis at the end of experiment. A total of 87 genes were identified as differentially expressed (DE) between these two groups (fold change > 1.2). These results were further validated by quantitative RT-PCR, confirming the high reliability of the RNA-Seq. Interestingly, DE genes implicated in inflammation and lipid metabolism were found to be downregulated by BMP in the colon. Network between genes and the top 15 KEGG pathways showed that PRKCβ (protein kinase C beta) and Pla2g2a (phospholipase A2 group IIA) strongly interacted with surrounding pathways and genes. Results revealed that BMP supplement could remodel key colon functions by altering transcriptomic profile in obese rats.

  6. The Physcomitrella patens gene atlas project: large-scale RNA-seq based expression data.

    PubMed

    Perroud, Pierre-François; Haas, Fabian B; Hiss, Manuel; Ullrich, Kristian K; Alboresi, Alessandro; Amirebrahimi, Mojgan; Barry, Kerrie; Bassi, Roberto; Bonhomme, Sandrine; Chen, Haodong; Coates, Juliet C; Fujita, Tomomichi; Guyon-Debast, Anouchka; Lang, Daniel; Lin, Junyan; Lipzen, Anna; Nogué, Fabien; Oliver, Melvin J; Ponce de León, Inés; Quatrano, Ralph S; Rameau, Catherine; Reiss, Bernd; Reski, Ralf; Ricca, Mariana; Saidi, Younousse; Sun, Ning; Szövényi, Péter; Sreedasyam, Avinash; Grimwood, Jane; Stacey, Gary; Schmutz, Jeremy; Rensing, Stefan A

    2018-07-01

    High-throughput RNA sequencing (RNA-seq) has recently become the method of choice to define and analyze transcriptomes. For the model moss Physcomitrella patens, although this method has been used to help analyze specific perturbations, no overall reference dataset has yet been established. In the framework of the Gene Atlas project, the Joint Genome Institute selected P. patens as a flagship genome, opening the way to generate the first comprehensive transcriptome dataset for this moss. The first round of sequencing described here is composed of 99 independent libraries spanning 34 different developmental stages and conditions. Upon dataset quality control and processing through read mapping, 28 509 of the 34 361 v3.3 gene models (83%) were detected to be expressed across the samples. Differentially expressed genes (DEGs) were calculated across the dataset to permit perturbation comparisons between conditions. The analysis of the three most distinct and abundant P. patens growth stages - protonema, gametophore and sporophyte - allowed us to define both general transcriptional patterns and stage-specific transcripts. As an example of variation of physico-chemical growth conditions, we detail here the impact of ammonium supplementation under standard growth conditions on the protonemal transcriptome. Finally, the cooperative nature of this project allowed us to analyze inter-laboratory variation, as 13 different laboratories around the world provided samples. We compare differences in the replication of experiments in a single laboratory and between different laboratories. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.

  7. Global Genomic Analysis of Prostate, Breast and Pancreatic Cancer

    DTIC Science & Technology

    2012-10-01

    fever virus (Lauck et al. 2011). The success of transposon-based genomic library construction for genomic analyses suggests that it should be possible...2011. Novel, divergent simian hemorrhagic Fever viruses in a wild ugandan red colobus Gertz et al. 140 Genome Research www.genome.org Cold Spring...2009. A strand-specific RNA-Seq analysis of the transcriptome of the typhoid bacillus Salmonella typhi. PLoS Genet 5: e1000569. doi: 10.1371

  8. De novo Assembly and Analysis of the Chilean Pencil Catfish Trichomycterus areolatus Transcriptome

    PubMed Central

    Schulze, Thomas T.; Ali, Jonathan M.; Bartlett, Maggie L.; McFarland, Madalyn M.; Clement, Emalie J.; Won, Harim I.; Sanford, Austin G.; Monzingo, Elyssa B.; Martens, Matthew C.; Hemsley, Ryan M.; Kumar, Sidharta; Gouin, Nicolas; Kolok, Alan S.; Davis, Paul H.

    2016-01-01

    Trichomycterus areolatus is an endemic species of pencil catfish that inhabits the riffles and rapids of many freshwater ecosystems of Chile. Despite its unique adaptation to Chile's high gradient watersheds and therefore potential application in the investigation of ecosystem integrity and environmental contamination, relatively little is known regarding the molecular biology of this environmental sentinel. Here, we detail the assembly of the Trichomycterus areolatus transcriptome, a molecular resource for the study of this organism and its molecular response to the environment. RNA-Seq reads were obtained by next-generation sequencing with an Illumina® platform and processed using PRINSEQ. The transcriptome assembly was performed using TRINITY assembler. Transcriptome validation was performed by functional characterization with KOG, KEGG, and GO analyses. Additionally, differential expression analysis highlights sex-specific expression patterns, and a list of endocrine and oxidative stress related transcripts are included. PMID:27672404

  9. The effect of Bacopa monnieri on gene expression levels in SH-SY5Y human neuroblastoma cells.

    PubMed

    Leung, How-Wing; Foo, Gabriel; Banumurthy, Gokulakrishna; Chai, Xiaoran; Ghosh, Sujoy; Mitra-Ganguli, Tora; VanDongen, Antonius M J

    2017-01-01

    Bacopa monnieri is a plant used as a nootropic in Ayurveda, a 5000-year-old system of traditional Indian medicine. Although both animal and clinical studies supported its role as a memory enhancer, the molecular and cellular mechanism underlying Bacopa's nootropic action are not understood. In this study, we used deep sequencing (RNA-Seq) to identify the transcriptome changes upon Bacopa treatment on SH-SY5Y human neuroblastoma cells. We identified several genes whose expression levels were regulated by Bacopa. Biostatistical analysis of the RNA-Seq data identified biological pathways and molecular functions that were regulated by Bacopa, including regulation of mRNA translation and transmembrane transport, responses to oxidative stress and protein misfolding. Pathway analysis using the Ingenuity platform suggested that Bacopa may protect against brain damage and improve brain development. These newly identified molecular and cellular determinants may contribute to the nootropic action of Bacopa and open up a new direction of investigation into its mechanism of action.

  10. The effect of Bacopa monnieri on gene expression levels in SH-SY5Y human neuroblastoma cells

    PubMed Central

    Foo, Gabriel; Banumurthy, Gokulakrishna; Chai, Xiaoran; Ghosh, Sujoy

    2017-01-01

    Bacopa monnieri is a plant used as a nootropic in Ayurveda, a 5000-year-old system of traditional Indian medicine. Although both animal and clinical studies supported its role as a memory enhancer, the molecular and cellular mechanism underlying Bacopa’s nootropic action are not understood. In this study, we used deep sequencing (RNA-Seq) to identify the transcriptome changes upon Bacopa treatment on SH-SY5Y human neuroblastoma cells. We identified several genes whose expression levels were regulated by Bacopa. Biostatistical analysis of the RNA-Seq data identified biological pathways and molecular functions that were regulated by Bacopa, including regulation of mRNA translation and transmembrane transport, responses to oxidative stress and protein misfolding. Pathway analysis using the Ingenuity platform suggested that Bacopa may protect against brain damage and improve brain development. These newly identified molecular and cellular determinants may contribute to the nootropic action of Bacopa and open up a new direction of investigation into its mechanism of action. PMID:28832626

  11. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs

    PubMed Central

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore

    2017-01-01

    Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659

  12. Detection of genetic variants between different Polish Landrace and Puławska pigs by means of RNA-seq analysis.

    PubMed

    Piórkowska, K; Żukowski, K; Ropka-Molik, K; Tyra, M

    2018-06-01

    Variant calling analysis based on RNA sequencing data provides information about gene variants. RNA-seq is cheaper and faster than is DNA sequencing. However, it requires individual hard filters during data processing due to post-transcriptional modifications such as splicing and RNA editing. In the present study, RNA-seq transcriptome data on two Polish pig breeds (Puławska, PUL, n = 8, and Polish Landrace, PL, n = 8) were included. The pig breeds are significantly different with regard to meat qualities such as texture, water exudation, growth traits and fat content in carcasses. A total of 2451 significant mutations were identified by a chi square tests, and functional analysis was carried out using Panther, KEGG and Kobas. Interesting missense gene variants and mutations located in regulatory regions were found in a few genes related to fatty acid metabolism and lipid storage such as ACSL5, ALDH3A2, FADS1, SCD, PLA2G12A and ATGL. A validation of mutational influences on pig traits was performed for ALDH3A2, ATGL, PLA2G12A and MYOM1 variants using association analysis including 215 pigs of the PL and PUL breeds. The ALDH3A2ENSSSCT00000019636.2:c.470T>C polymorphism was found to affect the weight of the ham and loin eye area. In turn, an ENSSSCT00000004091.2:c.2836G>A MYOM1 mutation, which could be implicated in myofibrillar network organisation, had an effect on meatiness and loin texture parameters. The study aimed to estimate the usefulness of RNA-seq results for a purpose other than differentially expressed gene analysis. The analysis performed indicated interesting gene variants that could be used in the future as markers during selection. © 2018 Stichting International Foundation for Animal Genetics.

  13. Comprehensive comparative analysis of 5'-end RNA-sequencing methods.

    PubMed

    Adiconis, Xian; Haber, Adam L; Simmons, Sean K; Levy Moonshine, Ami; Ji, Zhe; Busby, Michele A; Shi, Xi; Jacques, Justin; Lancaster, Madeline A; Pan, Jen Q; Regev, Aviv; Levin, Joshua Z

    2018-06-04

    Specialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods. We applied CAGE to eight brain-related samples and determined sample-specific transcription start site (TSS) usage, as well as a transcriptome-wide shift in TSS usage between fetal and adult brain.

  14. RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes

    PubMed Central

    Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.

    2013-01-01

    Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052

  15. Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis

    PubMed Central

    Mizuno, Takako; Sridharan, Anusha; Du, Yina; Guo, Minzhe; Wikenheiser-Brokamp, Kathryn A.; Perl, Anne-Karina T.; Funari, Vincent A.; Gokey, Jason J.; Stripp, Barry R.; Whitsett, Jeffrey A.

    2016-01-01

    Idiopathic pulmonary fibrosis (IPF) is a lethal interstitial lung disease characterized by airway remodeling, inflammation, alveolar destruction, and fibrosis. We utilized single-cell RNA sequencing (scRNA-seq) to identify epithelial cell types and associated biological processes involved in the pathogenesis of IPF. Transcriptomic analysis of normal human lung epithelial cells defined gene expression patterns associated with highly differentiated alveolar type 2 (AT2) cells, indicated by enrichment of RNAs critical for surfactant homeostasis. In contrast, scRNA-seq of IPF cells identified 3 distinct subsets of epithelial cell types with characteristics of conducting airway basal and goblet cells and an additional atypical transitional cell that contributes to pathological processes in IPF. Individual IPF cells frequently coexpressed alveolar type 1 (AT1), AT2, and conducting airway selective markers, demonstrating “indeterminate” states of differentiation not seen in normal lung development. Pathway analysis predicted aberrant activation of canonical signaling via TGF-β, HIPPO/YAP, P53, WNT, and AKT/PI3K. Immunofluorescence confocal microscopy identified the disruption of alveolar structure and loss of the normal proximal-peripheral differentiation of pulmonary epithelial cells. scRNA-seq analyses identified loss of normal epithelial cell identities and unique contributions of epithelial cells to the pathogenesis of IPF. The present study provides a rich data source to further explore lung health and disease. PMID:27942595

  16. De Novo RNA Sequencing and Transcriptome Analysis of Monascus purpureus and Analysis of Key Genes Involved in Monacolin K Biosynthesis

    PubMed Central

    Zhang, Chan; Liang, Jian; Yang, Le; Sun, Baoguo; Wang, Chengtao

    2017-01-01

    Monascus purpureus is an important medicinal and edible microbial resource. To facilitate biological, biochemical, and molecular research on medicinal components of M. purpureus, we investigated the M. purpureus transcriptome by RNA sequencing (RNA-seq). An RNA-seq library was created using RNA extracted from a mixed sample of M. purpureus expressing different levels of monacolin K output. In total 29,713 unigenes were assembled from more than 60 million high-quality short reads. A BLAST search revealed hits for 21,331 unigenes in at least one of the protein or nucleotide databases used in this study. The 22,365 unigenes were categorized into 48 functional groups based on Gene Ontology classification. Owing to the economic and medicinal importance of M. purpureus, most studies on this organism have focused on the pharmacological activity of chemical components and the molecular function of genes involved in their biogenesis. In this study, we performed quantitative real-time PCR to detect the expression of genes related to monacolin K (mokA-mokI) at different phases (2, 5, 8, and 12 days) of M. purpureus M1 and M1-36. Our study found that mokF modulates monacolin K biogenesis in M. purpureus. Nine genes were suggested to be associated with the monacolin K biosynthesis. Studies on these genes could provide useful information on secondary metabolic processes in M. purpureus. These results indicate a detailed resource through genetic engineering of monacolin K biosynthesis in M. purpureus and related species. PMID:28114365

  17. Transcriptome of the Antarctic brooding gastropod mollusc Margarella antarctica.

    PubMed

    Clark, Melody S; Thorne, Michael A S

    2015-12-01

    454 RNA-Seq transcriptome data were generated from foot tissue of the Antarctic brooding gastropod mollusc Margarella antarctica. A total of 6195 contigs were assembled de novo, providing a useful resource for researchers with an interest in Antarctic marine species, phylogenetics and mollusc biology, especially shell production. Copyright © 2015 Elsevier B.V. All rights reserved.

  18. TEcandidates: Prediction of genomic origin of expressed Transposable Elements using RNA-seq data.

    PubMed

    Valdebenito-Maturana, Braulio; Riadi, Gonzalo

    2018-06-01

    In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multimapping reads coming from their repetitive sequences. Current approaches that address multimapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell. We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file, and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed, and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis. The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates. griadi@utalca.cl. Supplementary data are available at Bioinformatics online.

  19. Integrating RNA sequencing into neuro-oncology practice.

    PubMed

    Rogawski, David S; Vitanza, Nicholas A; Gauthier, Angela C; Ramaswamy, Vijay; Koschmann, Carl

    2017-11-01

    Malignant tumors of the central nervous system (CNS) cause substantial morbidity and mortality, yet efforts to optimize chemo- and radiotherapy have largely failed to improve dismal prognoses. Over the past decade, RNA sequencing (RNA-seq) has emerged as a powerful tool to comprehensively characterize the transcriptome of CNS tumor cells in one high-throughput step, leading to improved understanding of CNS tumor biology and suggesting new routes for targeted therapies. RNA-seq has been instrumental in improving the diagnostic classification of brain tumors, characterizing oncogenic fusion genes, and shedding light on intratumor heterogeneity. Currently, RNA-seq is beginning to be incorporated into regular neuro-oncology practice in the form of precision neuro-oncology programs, which use information from tumor sequencing to guide implementation of personalized targeted therapies. These programs show great promise in improving patient outcomes for tumors where single agent trials have been ineffective. As RNA-seq is a relatively new technique, many further applications yielding new advances in CNS tumor research and management are expected in the coming years. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Comparative Transcriptomic Analyses of Vegetable and Grain Pea (Pisum sativum L.) Seed Development

    PubMed Central

    Liu, Na; Zhang, Guwen; Xu, Shengchun; Mao, Weihua; Hu, Qizan; Gong, Yaming

    2015-01-01

    Understanding the molecular mechanisms regulating pea seed developmental process is extremely important for pea breeding. In this study, we used high-throughput RNA-Seq and bioinformatics analyses to examine the changes in gene expression during seed development in vegetable pea and grain pea, and compare the gene expression profiles of these two pea types. RNA-Seq generated 18.7 G of raw data, which were then de novo assembled into 77,273 unigenes with a mean length of 930 bp. Our results illustrate that transcriptional control during pea seed development is a highly coordinated process. There were 459 and 801 genes differentially expressed at early and late seed maturation stages between vegetable pea and grain pea, respectively. Soluble sugar and starch metabolism related genes were significantly activated during the development of pea seeds coinciding with the onset of accumulation of sugar and starch in the seeds. A comparative analysis of genes involved in sugar and starch biosynthesis in vegetable pea (high seed soluble sugar and low starch) and grain pea (high seed starch and low soluble sugar) revealed that differential expression of related genes at late development stages results in a negative correlation between soluble sugar and starch biosynthetic flux in vegetable and grain pea seeds. RNA-Seq data was validated by using real-time quantitative RT-PCR analysis for 30 randomly selected genes. To our knowledge, this work represents the first report of seed development transcriptomics in pea. The obtained results provide a foundation to support future efforts to unravel the underlying mechanisms that control the developmental biology of pea seeds, and serve as a valuable resource for improving pea breeding. PMID:26635856

  1. The Impact of Normalization Methods on RNA-Seq Data Analysis

    PubMed Central

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  2. Transcriptome analysis and gene expression profiling of abortive and developing ovules during fruit development in hazelnut.

    PubMed

    Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting

    2015-01-01

    A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.

  3. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells.

    PubMed

    Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun

    2015-01-01

    Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.

  4. Transcriptomic features associated with energy production in the muscles of Pacific bluefin tuna and Pacific cod.

    PubMed

    Shibata, Mami; Mekuchi, Miyuki; Mori, Kazuki; Muta, Shigeru; Chowdhury, Vishwajit Sur; Nakamura, Yoji; Ojima, Nobuhiko; Saitoh, Kenji; Kobayashi, Takanori; Wada, Tokio; Inouye, Kiyoshi; Kuhara, Satoru; Tashiro, Kosuke

    2016-06-01

    Bluefin tuna are high-performance swimmers and top predators in the open ocean. Their swimming is grounded by unique features including an exceptional glycolytic potential in white muscle, which is supported by high enzymatic activities. Here we performed high-throughput RNA sequencing (RNA-Seq) in muscles of the Pacific bluefin tuna (Thunnus orientalis) and Pacific cod (Gadus macrocephalus) and conducted a comparative transcriptomic analysis of genes related to energy production. We found that the total expression of glycolytic genes was much higher in the white muscle of tuna than in the other muscles, and that the expression of only six genes for glycolytic enzymes accounted for 83.4% of the total. These expression patterns were in good agreement with the patterns of enzyme activity previously reported. The findings suggest that the mRNA expression of glycolytic genes may contribute directly to the enzymatic activities in the muscles of tuna.

  5. Transcriptomic Analysis Reveals Mechanisms of Sterile and Fertile Flower Differentiation and Development in Viburnum macrocephalum f. keteleeri

    PubMed Central

    Lu, Zhaogeng; Xu, Jing; Li, Weixing; Zhang, Li; Cui, Jiawen; He, Qingsong; Wang, Li; Jin, Biao

    2017-01-01

    Sterile and fertile flowers are an important evolutionary developmental (evo-devo) phenotype in angiosperm flowers, playing important roles in pollinator attraction and sexual reproductive success. However, the gene regulatory mechanisms underlying fertile and sterile flower differentiation and development remain largely unknown. Viburnum macrocephalum f. keteleeri, which possesses fertile and sterile flowers in a single inflorescence, is a useful candidate species for investigating the regulatory networks in differentiation and development. We developed a de novo-assembled flower reference transcriptome. Using RNA sequencing (RNA-seq), we compared the expression patterns of fertile and sterile flowers isolated from the same inflorescence over its rapid developmental stages. The flower reference transcriptome consisted of 105,683 non-redundant transcripts, of which 5,675 transcripts showed significant differential expression between fertile and sterile flowers. Combined with morphological and cytological changes between fertile and sterile flowers, we identified expression changes of many genes potentially involved in reproductive processes, phytohormone signaling, and cell proliferation and expansion using RNA-seq and qRT-PCR. In particular, many transcription factors (TFs), including MADS-box family members and ABCDE-class genes, were identified, and expression changes in TFs involved in multiple functions were analyzed and highlighted to determine their roles in regulating fertile and sterile flower differentiation and development. Our large-scale transcriptional analysis of fertile and sterile flowers revealed the dynamics of transcriptional networks and potentially key components in regulating differentiation and development of fertile and sterile flowers in Viburnum macrocephalum f. keteleeri. Our data provide a useful resource for Viburnum transcriptional research and offer insights into gene regulation of differentiation of diverse evo-devo processes in flowers. PMID:28298915

  6. Transcriptomes of Trypanosoma brucei rhodesiense from sleeping sickness patients, rodents and culture: Effects of strain, growth conditions and RNA preparation methods

    PubMed Central

    Mulindwa, Julius; Leiss, Kevin; Ibberson, David; Kamanyi Marucha, Kevin; Helbig, Claudia; Melo do Nascimento, Larissa; Silvester, Eleanor; Matthews, Keith; Matovu, Enock; Enyaru, John

    2018-01-01

    All of our current knowledge of African trypanosome metabolism is based on results from trypanosomes grown in culture or in rodents. Drugs against sleeping sickness must however treat trypanosomes in humans. We here compare the transcriptomes of Trypanosoma brucei rhodesiense from the blood and cerebrospinal fluid of human patients with those of trypanosomes from culture and rodents. The data were aligned and analysed using new user-friendly applications designed for Kinetoplastid RNA-Seq data. The transcriptomes of trypanosomes from human blood and cerebrospinal fluid did not predict major metabolic differences that might affect drug susceptibility. Usefully, there were relatively few differences between the transcriptomes of trypanosomes from patients and those of similar trypanosomes grown in rats. Transcriptomes of monomorphic laboratory-adapted parasites grown in in vitro culture closely resembled those of the human parasites, but some differences were seen. In poly(A)-selected mRNA transcriptomes, mRNAs encoding some protein kinases and RNA-binding proteins were under-represented relative to mRNA that had not been poly(A) selected; further investigation revealed that the selection tends to result in loss of longer mRNAs. PMID:29474390

  7. Transcriptomes of Trypanosoma brucei rhodesiense from sleeping sickness patients, rodents and culture: Effects of strain, growth conditions and RNA preparation methods.

    PubMed

    Mulindwa, Julius; Leiss, Kevin; Ibberson, David; Kamanyi Marucha, Kevin; Helbig, Claudia; Melo do Nascimento, Larissa; Silvester, Eleanor; Matthews, Keith; Matovu, Enock; Enyaru, John; Clayton, Christine

    2018-02-01

    All of our current knowledge of African trypanosome metabolism is based on results from trypanosomes grown in culture or in rodents. Drugs against sleeping sickness must however treat trypanosomes in humans. We here compare the transcriptomes of Trypanosoma brucei rhodesiense from the blood and cerebrospinal fluid of human patients with those of trypanosomes from culture and rodents. The data were aligned and analysed using new user-friendly applications designed for Kinetoplastid RNA-Seq data. The transcriptomes of trypanosomes from human blood and cerebrospinal fluid did not predict major metabolic differences that might affect drug susceptibility. Usefully, there were relatively few differences between the transcriptomes of trypanosomes from patients and those of similar trypanosomes grown in rats. Transcriptomes of monomorphic laboratory-adapted parasites grown in in vitro culture closely resembled those of the human parasites, but some differences were seen. In poly(A)-selected mRNA transcriptomes, mRNAs encoding some protein kinases and RNA-binding proteins were under-represented relative to mRNA that had not been poly(A) selected; further investigation revealed that the selection tends to result in loss of longer mRNAs.

  8. Analysis of clock-regulated genes in Neurospora reveals widespread posttranscriptional control of metabolic potential

    PubMed Central

    Hurley, Jennifer M.; Dasgupta, Arko; Emerson, Jillian M.; Zhou, Xiaoying; Ringelberg, Carol S.; Knabe, Nicole; Lipzen, Anna M.; Lindquist, Erika A.; Daum, Christopher G.; Barry, Kerrie W.; Grigoriev, Igor V.; Smith, Kristina M.; Galagan, James E.; Bell-Pedersen, Deborah; Freitag, Michael; Cheng, Chao; Loros, Jennifer J.; Dunlap, Jay C.

    2014-01-01

    Neurospora crassa has been for decades a principal model for filamentous fungal genetics and physiology as well as for understanding the mechanism of circadian clocks. Eukaryotic fungal and animal clocks comprise transcription-translation–based feedback loops that control rhythmic transcription of a substantial fraction of these transcriptomes, yielding the changes in protein abundance that mediate circadian regulation of physiology and metabolism: Understanding circadian control of gene expression is key to understanding eukaryotic, including fungal, physiology. Indeed, the isolation of clock-controlled genes (ccgs) was pioneered in Neurospora where circadian output begins with binding of the core circadian transcription factor WCC to a subset of ccg promoters, including those of many transcription factors. High temporal resolution (2-h) sampling over 48 h using RNA sequencing (RNA-Seq) identified circadianly expressed genes in Neurospora, revealing that from ∼10% to as much 40% of the transcriptome can be expressed under circadian control. Functional classifications of these genes revealed strong enrichment in pathways involving metabolism, protein synthesis, and stress responses; in broad terms, daytime metabolic potential favors catabolism, energy production, and precursor assembly, whereas night activities favor biosynthesis of cellular components and growth. Discriminative regular expression motif elicitation (DREME) identified key promoter motifs highly correlated with the temporal regulation of ccgs. Correlations between ccg abundance from RNA-Seq, the degree of ccg-promoter activation as reported by ccg-promoter–luciferase fusions, and binding of WCC as measured by ChIP-Seq, are not strong. Therefore, although circadian activation is critical to ccg rhythmicity, posttranscriptional regulation plays a major role in determining rhythmicity at the mRNA level. PMID:25362047

  9. Transcriptome-wide analysis supports environmental adaptations of two Pinus pinaster populations from contrasting habitats.

    PubMed

    Cañas, Rafael A; Feito, Isabel; Fuente-Maqueda, José Francisco; Ávila, Concepción; Majada, Juan; Cánovas, Francisco M

    2015-11-06

    Maritime pine (Pinus pinaster Aiton) grows in a range of different climates in the southwestern Mediterranean region and the existence of a variety of latitudinal ecotypes or provenances is well established. In this study, we have conducted a deep analysis of the transcriptome in needles from two P. pinaster provenances, Leiria (Portugal) and Tamrabta (Morocco), which were grown in northern Spain under the same conditions. An oligonucleotide microarray (PINARRAY3) and RNA-Seq were used for whole-transcriptome analyses, and we found that 90.95% of the data were concordant between the two platforms. Furthermore, the two methods identified very similar percentages of differentially expressed genes with values of 5.5% for PINARRAY3 and 5.7% for RNA-Seq. In total, 6,023 transcripts were shared and 88 differentially expressed genes overlapped in the two platforms. Among the differentially expressed genes, all transport related genes except aquaporins were expressed at higher levels in Tamrabta than in Leiria. In contrast, genes involved in secondary metabolism were expressed at higher levels in Tamrabta, and photosynthesis-related genes were expressed more highly in Leiria. The genes involved in light sensing in plants were well represented in the differentially expressed groups of genes. In addition, increased levels of hormones such as abscisic acid, gibberellins, jasmonic and salicylic acid were observed in Leiria. Both transcriptome platforms have proven to be useful resources, showing complementary and reliable results. The results presented here highlight the different abilities of the two maritime pine populations to sense environmental conditions and reveal one type of regulation that can be ascribed to different genetic and epigenetic backgrounds.

  10. Rapid Recovery Gene Downregulation during Excess-Light Stress and Recovery in Arabidopsis.

    PubMed

    Crisp, Peter A; Ganguly, Diep R; Smith, Aaron B; Murray, Kevin D; Estavillo, Gonzalo M; Searle, Iain; Ford, Ethan; Bogdanović, Ozren; Lister, Ryan; Borevitz, Justin O; Eichten, Steven R; Pogson, Barry J

    2017-08-01

    Stress recovery may prove to be a promising approach to increase plant performance and, theoretically, mRNA instability may facilitate faster recovery. Transcriptome (RNA-seq, qPCR, sRNA-seq, and PARE) and methylome profiling during repeated excess-light stress and recovery was performed at intervals as short as 3 min. We demonstrate that 87% of the stress-upregulated mRNAs analyzed exhibit very rapid recovery. For instance, HSP101 abundance declined 2-fold every 5.1 min. We term this phenomenon rapid recovery gene downregulation (RRGD), whereby mRNA abundance rapidly decreases promoting transcriptome resetting. Decay constants ( k ) were modeled using two strategies, linear and nonlinear least squares regressions, with the latter accounting for both transcription and degradation. This revealed extremely short half-lives ranging from 2.7 to 60.0 min for 222 genes. Ribosome footprinting using degradome data demonstrated RRGD loci undergo cotranslational decay and identified changes in the ribosome stalling index during stress and recovery. However, small RNAs and 5'-3' RNA decay were not essential for recovery of the transcripts examined, nor were any of the six excess light-associated methylome changes. We observed recovery-specific gene expression networks upon return to favorable conditions and six transcriptional memory types. In summary, rapid transcriptome resetting is reported in the context of active recovery and cellular memory. © 2017 American Society of Plant Biologists. All rights reserved.

  11. Rapid Recovery Gene Downregulation during Excess-Light Stress and Recovery in Arabidopsis[OPEN

    PubMed Central

    Estavillo, Gonzalo M.

    2017-01-01

    Stress recovery may prove to be a promising approach to increase plant performance and, theoretically, mRNA instability may facilitate faster recovery. Transcriptome (RNA-seq, qPCR, sRNA-seq, and PARE) and methylome profiling during repeated excess-light stress and recovery was performed at intervals as short as 3 min. We demonstrate that 87% of the stress-upregulated mRNAs analyzed exhibit very rapid recovery. For instance, HSP101 abundance declined 2-fold every 5.1 min. We term this phenomenon rapid recovery gene downregulation (RRGD), whereby mRNA abundance rapidly decreases promoting transcriptome resetting. Decay constants (k) were modeled using two strategies, linear and nonlinear least squares regressions, with the latter accounting for both transcription and degradation. This revealed extremely short half-lives ranging from 2.7 to 60.0 min for 222 genes. Ribosome footprinting using degradome data demonstrated RRGD loci undergo cotranslational decay and identified changes in the ribosome stalling index during stress and recovery. However, small RNAs and 5ʹ-3ʹ RNA decay were not essential for recovery of the transcripts examined, nor were any of the six excess light-associated methylome changes. We observed recovery-specific gene expression networks upon return to favorable conditions and six transcriptional memory types. In summary, rapid transcriptome resetting is reported in the context of active recovery and cellular memory. PMID:28705956

  12. Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture.

    PubMed

    Ghaffari, Noushin; Sanchez-Flores, Alejandro; Doan, Ryan; Garcia-Orozco, Karina D; Chen, Patricia L; Ochoa-Leyva, Adrian; Lopez-Zavala, Alonso A; Carrasco, J Salvador; Hong, Chris; Brieba, Luis G; Rudiño-Piñera, Enrique; Blood, Philip D; Sawyer, Jason E; Johnson, Charles D; Dindot, Scott V; Sotelo-Mundo, Rogerio R; Criscitiello, Michael F

    2014-11-25

    We present a new transcriptome assembly of the Pacific whiteleg shrimp (Litopenaeus vannamei), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality of this assembly and the affiliated targeted homology searches greatly enrich the curated transcripts currently available in public databases for this species. Comparison with the model arthropod Daphnia allows some insights into defining characteristics of decapod crustaceans. This large-scale gene discovery gives the broadest depth yet to the annotated transcriptome of this important species and should be of value to ongoing genomics and immunogenetic resistance studies in this shrimp of paramount global economic importance.

  13. An atlas of gene expression and gene co-regulation in the human retina.

    PubMed

    Pinelli, Michele; Carissimo, Annamaria; Cutillo, Luisa; Lai, Ching-Hung; Mutarelli, Margherita; Moretti, Maria Nicoletta; Singh, Marwah Veer; Karali, Marianthi; Carrella, Diego; Pizzo, Mariateresa; Russo, Francesco; Ferrari, Stefano; Ponzin, Diego; Angelini, Claudia; Banfi, Sandro; di Bernardo, Diego

    2016-07-08

    The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics

    PubMed Central

    House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.

    2017-01-01

    Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636

  15. A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data

    PubMed Central

    Gallopin, Mélina; Rau, Andrea; Jaffrézic, Florence

    2013-01-01

    Gene network inference from transcriptomic data is an important methodological challenge and a key aspect of systems biology. Although several methods have been proposed to infer networks from microarray data, there is a need for inference methods able to model RNA-seq data, which are count-based and highly variable. In this work we propose a hierarchical Poisson log-normal model with a Lasso penalty to infer gene networks from RNA-seq data; this model has the advantage of directly modelling discrete data and accounting for inter-sample variance larger than the sample mean. Using real microRNA-seq data from breast cancer tumors and simulations, we compare this method to a regularized Gaussian graphical model on log-transformed data, and a Poisson log-linear graphical model with a Lasso penalty on power-transformed data. For data simulated with large inter-sample dispersion, the proposed model performs better than the other methods in terms of sensitivity, specificity and area under the ROC curve. These results show the necessity of methods specifically designed for gene network inference from RNA-seq data. PMID:24147011

  16. Next generation sequencing analysis reveals that the ribonucleases RNase II, RNase R and PNPase affect bacterial motility and biofilm formation in E. coli.

    PubMed

    Pobre, Vânia; Arraiano, Cecília M

    2015-02-14

    The RNA steady-state levels in the cell are a balance between synthesis and degradation rates. Although transcription is important, RNA processing and turnover are also key factors in the regulation of gene expression. In Escherichia coli there are three main exoribonucleases (RNase II, RNase R and PNPase) involved in RNA degradation. Although there are many studies about these exoribonucleases not much is known about their global effect in the transcriptome. In order to study the effects of the exoribonucleases on the transcriptome, we sequenced the total RNA (RNA-Seq) from wild-type cells and from mutants for each of the exoribonucleases (∆rnb, ∆rnr and ∆pnp). We compared each of the mutant transcriptome with the wild-type to determine the global effects of the deletion of each exoribonucleases in exponential phase. We determined that the deletion of RNase II significantly affected 187 transcripts, while deletion of RNase R affects 202 transcripts and deletion of PNPase affected 226 transcripts. Surprisingly, many of the transcripts are actually down-regulated in the exoribonuclease mutants when compared to the wild-type control. The results obtained from the transcriptomic analysis pointed to the fact that these enzymes were changing the expression of genes related with flagellum assembly, motility and biofilm formation. The three exoribonucleases affected some stable RNAs, but PNPase was the main exoribonuclease affecting this class of RNAs. We confirmed by qPCR some fold-change values obtained from the RNA-Seq data, we also observed that all the exoribonuclease mutants were significantly less motile than the wild-type cells. Additionally, RNase II and RNase R mutants were shown to produce more biofilm than the wild-type control while the PNPase mutant did not form biofilms. In this work we demonstrate how deep sequencing can be used to discover new and relevant functions of the exoribonucleases. We were able to obtain valuable information about the transcripts affected by each of the exoribonucleases and compare the roles of the three enzymes. Our results show that the three exoribonucleases affect cell motility and biofilm formation that are two very important factors for cell survival, especially for pathogenic cells.

  17. Comprehensive gene expression analysis of canine invasive urothelial bladder carcinoma by RNA-Seq.

    PubMed

    Maeda, Shingo; Tomiyasu, Hirotaka; Tsuboi, Masaya; Inoue, Akiko; Ishihara, Genki; Uchikai, Takao; Chambers, James K; Uchida, Kazuyuki; Yonezawa, Tomohiro; Matsuki, Naoaki

    2018-04-27

    Invasive urothelial carcinoma (iUC) is a major cause of death in humans, and approximately 165,000 individuals succumb to this cancer annually worldwide. Comparative oncology using relevant animal models is necessary to improve our understanding of progression, diagnosis, and treatment of iUC. Companion canines are a preferred animal model of iUC due to spontaneous tumor development and similarity to human disease in terms of histopathology, metastatic behavior, and treatment response. However, the comprehensive molecular characterization of canine iUC is not well documented. In this study, we performed transcriptome analysis of tissue samples from canine iUC and normal bladders using an RNA sequencing (RNA-Seq) approach to identify key molecular pathways in canine iUC. Total RNA was extracted from bladder tissues of 11 dogs with iUC and five healthy dogs, and RNA-Seq was conducted. Ingenuity Pathway Analysis (IPA) was used to assign differentially expressed genes to known upstream regulators and functional networks. Differential gene expression analysis of the RNA-Seq data revealed 2531 differentially expressed genes, comprising 1007 upregulated and 1524 downregulated genes, in canine iUC. IPA revealed that the most activated upstream regulator was PTGER2 (encoding the prostaglandin E 2 receptor EP2), which is consistent with the therapeutic efficiency of cyclooxygenase inhibitors in canine iUC. Similar to human iUC, canine iUC exhibited upregulated ERBB2 and downregulated TP53 pathways. Biological functions associated with cancer, cell proliferation, and leukocyte migration were predicted to be activated, while muscle functions were predicted to be inhibited, indicating muscle-invasive tumor property. Our data confirmed similarities in gene expression patterns between canine and human iUC and identified potential therapeutic targets (PTGER2, ERBB2, CCND1, Vegf, and EGFR), suggesting the value of naturally occurring canine iUC as a relevant animal model for human iUC.

  18. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    PubMed

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei

    2017-05-19

    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    PubMed

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB ( http://paramecium.i2bc.paris-saclay.fr ). TrUC software is freely distributed under a GNU GPL v3 licence ( https://github.com/oarnaiz/TrUC ).

  20. RNA-Seq analysis of salinity stress-responsive transcriptome in the liver of spotted sea bass (Lateolabrax maculatus).

    PubMed

    Zhang, Xiaoyan; Wen, Haishen; Wang, Hailiang; Ren, Yuanyuan; Zhao, Ji; Li, Yun

    2017-01-01

    Salinity is one of the most prominent abiotic factors, which greatly influence reproduction, development, growth, physiological and metabolic activities of fishes. Spotted sea bass (Lateolabrax maculatus), as a euryhaline marine teleost, has extraordinary ability to deal with a wide range of salinity changes. However, this species is devoid of genomic resources, and no study has been conducted at the transcriptomic level to determine genes responsible for salinity regulation, which impedes the understanding of the fundamental mechanism conferring tolerance to salinity fluctuations. Liver, as the major metabolic organ, is the key source supplying energy for iono- and osmoregulation in fish, however, little attention has been paid to its salinity-related functions but which should not be ignored. In this study, we perform RNA-Seq analysis to identify genes involved in salinity adaptation and osmoregulation in liver of spotted sea bass, generating from the fishes exposed to low and high salinity water (5 vs 30ppt). After de novo assembly, annotation and differential gene expression analysis, a total of 455 genes were differentially expressed, including 184 up-regulated and 271 down-regulated transcripts in low salinity-acclimated fish group compared with that in high salinity-acclimated group. A number of genes with a potential role in salinity adaptation for spotted sea bass were classified into five functional categories based on the gene ontology (GO) and enrichment analysis, which include genes involved in metabolites and ion transporters, energy metabolism, signal transduction, immune response and structure reorganization. The candidate genes identified in L. maculates liver provide valuable information to explore new pathways related to fish salinity and osmotic regulation. Besides, the transcriptomic sequencing data supplies significant resources for identification of novel genes and further studying biological questions in spotted sea bass.

  1. RNA-Seq analysis of salinity stress–responsive transcriptome in the liver of spotted sea bass (Lateolabrax maculatus)

    PubMed Central

    Zhang, Xiaoyan; Wen, Haishen; Wang, Hailiang; Ren, Yuanyuan; Zhao, Ji; Li, Yun

    2017-01-01

    Salinity is one of the most prominent abiotic factors, which greatly influence reproduction, development, growth, physiological and metabolic activities of fishes. Spotted sea bass (Lateolabrax maculatus), as a euryhaline marine teleost, has extraordinary ability to deal with a wide range of salinity changes. However, this species is devoid of genomic resources, and no study has been conducted at the transcriptomic level to determine genes responsible for salinity regulation, which impedes the understanding of the fundamental mechanism conferring tolerance to salinity fluctuations. Liver, as the major metabolic organ, is the key source supplying energy for iono- and osmoregulation in fish, however, little attention has been paid to its salinity-related functions but which should not be ignored. In this study, we perform RNA-Seq analysis to identify genes involved in salinity adaptation and osmoregulation in liver of spotted sea bass, generating from the fishes exposed to low and high salinity water (5 vs 30ppt). After de novo assembly, annotation and differential gene expression analysis, a total of 455 genes were differentially expressed, including 184 up-regulated and 271 down-regulated transcripts in low salinity-acclimated fish group compared with that in high salinity-acclimated group. A number of genes with a potential role in salinity adaptation for spotted sea bass were classified into five functional categories based on the gene ontology (GO) and enrichment analysis, which include genes involved in metabolites and ion transporters, energy metabolism, signal transduction, immune response and structure reorganization. The candidate genes identified in L. maculates liver provide valuable information to explore new pathways related to fish salinity and osmotic regulation. Besides, the transcriptomic sequencing data supplies significant resources for identification of novel genes and further studying biological questions in spotted sea bass. PMID:28253338

  2. Cross-disease transcriptomics: Unique IL-17A signaling in psoriasis lesions and an autoimmune PBMC signature

    PubMed Central

    Sarkar, Mrinal K.; Liang, Yun; Xing, Xianying; Gudjonsson, Johann E.

    2016-01-01

    Transcriptome studies of psoriasis have identified robust changes in mRNA expression through large-scale analysis of patient cohorts. These studies, however, have analyzed all mRNA changes in aggregate, without distinguishing between disease-specific and non-specific differentially expressed genes (DEGs). In this study, RNA-seq meta-analysis was used to identify (1) psoriasis-specific DEGs altered in few diseases besides psoriasis and (2) non-specific DEGs similarly altered in many other skin conditions. We show that few cutaneous DEGs are psoriasis-specific and that the two DEG classes differ in their cell type and cytokine associations. Psoriasis-specific DEGs are expressed by keratinocytes and induced by IL-17A, whereas non-specific DEGs are expressed by inflammatory cells and induced by IFN-gamma and TNF. PBMC-derived DEGs were more psoriasis-specific than cutaneous DEGs. Nonetheless, PBMC DEGs associated with MHC class I and NK cells were commonly downregulated in psoriasis and other autoimmune diseases (e.g., multiple sclerosis, sarcoidosis and juvenile rheumatoid arthritis). These findings demonstrate “cross-disease” transcriptomics as an approach to gain insights into the cutaneous and non-cutaneous psoriasis transcriptomes. This highlighted unique contributions of IL-17A to the cytokine network and uncovered a blood-based gene signature that links psoriasis to other diseases of autoimmunity. PMID:27206706

  3. Transcriptome profiling analysis reveals the role of silique in controlling seed oil content in Brassica napus.

    PubMed

    Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng; Li, Xue-Bao

    2017-01-01

    Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus.

  4. Transcriptome profiling analysis reveals the role of silique in controlling seed oil content in Brassica napus

    PubMed Central

    Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng

    2017-01-01

    Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus. PMID:28594951

  5. RNA-seq Analysis Reveals Unique Transcriptome Signatures in Systemic Lupus Erythematosus Patients with Distinct Autoantibody Specificities

    PubMed Central

    Rai, Richa; Chauhan, Sudhir Kumar; Singh, Vikas Vikram; Rai, Madhukar; Rai, Geeta

    2016-01-01

    Systemic lupus erythematosus (SLE) patients exhibit immense heterogeneity which is challenging from the diagnostic perspective. Emerging high throughput sequencing technologies have been proved to be a useful platform to understand the complex and dynamic disease processes. SLE patients categorised based on autoantibody specificities are reported to have differential immuno-regulatory mechanisms. Therefore, we performed RNA-seq analysis to identify transcriptomics of SLE patients with distinguished autoantibody specificities. The SLE patients were segregated into three subsets based on the type of autoantibodies present in their sera (anti-dsDNA+ group with anti-dsDNA autoantibody alone; anti-ENA+ group having autoantibodies against extractable nuclear antigens (ENA) only, and anti-dsDNA+ENA+ group having autoantibodies to both dsDNA and ENA). Global transcriptome profiling for each SLE patients subsets was performed using Illumina® Hiseq-2000 platform. The biological relevance of dysregulated transcripts in each SLE subsets was assessed by ingenuity pathway analysis (IPA) software. We observed that dysregulation in the transcriptome expression pattern was clearly distinct in each SLE patients subsets. IPA analysis of transcripts uniquely expressed in different SLE groups revealed specific biological pathways to be affected in each SLE subsets. Multiple cytokine signaling pathways were specifically dysregulated in anti-dsDNA+ patients whereas Interferon signaling was predominantly dysregulated in anti-ENA+ patients. In anti-dsDNA+ENA+ patients regulation of actin based motility by Rho pathway was significantly affected. The granulocyte gene signature was a common feature to all SLE subsets; however, anti-dsDNA+ group showed relatively predominant expression of these genes. Dysregulation of Plasma cell related transcripts were higher in anti-dsDNA+ and anti-ENA+ patients as compared to anti-dsDNA+ ENA+. Association of specific canonical pathways with the uniquely expressed transcripts in each SLE subgroup indicates that specific immunological disease mechanisms are operative in distinct SLE patients’ subsets. This ‘sub-grouping’ approach could further be useful for clinical evaluation of SLE patients and devising targeted therapeutics. PMID:27835693

  6. RNA-Seq-based transcriptome profiling of early nitrogen deficiency response in cucumber seedlings provides new insight into the putative nitrogen regulatory network.

    PubMed

    Zhao, Wenchao; Yang, Xueyong; Yu, Hongjun; Jiang, Weijie; Sun, Na; Liu, Xiaoran; Liu, Xiaolin; Zhang, Xiaomeng; Wang, Yan; Gu, Xingfang

    2015-03-01

    Nitrogen (N) is both an important macronutrient and a signal for plant growth and development. However, the early regulatory mechanism of plants in response to N starvation is not well understood, especially in cucumber, an economically important crop that normally consumes excessive N during production. In this study, the early time-course transcriptome response of cucumber leaves under N deficiency was monitored using RNA sequencing (RNA-Seq). More than 23,000 transcripts were examined in cucumber leaves, of which 364 genes were differentially expressed in response to N deficiency. Based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, gene ontology (GO) and protein-protein interaction analysis, 64 signaling-related N-deficiency-responsive genes were identified. Furthermore, the potential regulatory mechanisms of anthocyanin accumulation, Chl decline and cell wall remodeling were assessed at the transcription level. Increased ascorbic acid synthesis was identified in cucumber seedlings and fruit under N-deficient conditions, and a new corresponding regulatory hypothesis has been proposed. A data cross-comparison between model plants and cucumber was made, and some common and specific N-deficient response mechanisms were found in the present study. Our study provides novel insights into the responses of cucumber to nitrogen starvation at the global transcriptome level, which are expected to be highly useful for dissecting the N response pathways in this major vegetable and for improving N fertilization practices. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  7. Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

    PubMed

    Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

    2018-06-03

    Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.

  8. ChimerDB 3.0: an enhanced database for fusion genes from cancer transcriptome and literature data mining.

    PubMed

    Lee, Myunggyo; Lee, Kyubum; Yu, Namhee; Jang, Insu; Choi, Ikjung; Kim, Pora; Jang, Ye Eun; Kim, Byounggun; Kim, Sunkyu; Lee, Byungwook; Kang, Jaewoo; Lee, Sanghyuk

    2017-01-04

    Fusion gene is an important class of therapeutic targets and prognostic markers in cancer. ChimerDB is a comprehensive database of fusion genes encompassing analysis of deep sequencing data and manual curations. In this update, the database coverage was enhanced considerably by adding two new modules of The Cancer Genome Atlas (TCGA) RNA-Seq analysis and PubMed abstract mining. ChimerDB 3.0 is composed of three modules of ChimerKB, ChimerPub and ChimerSeq. ChimerKB represents a knowledgebase including 1066 fusion genes with manual curation that were compiled from public resources of fusion genes with experimental evidences. ChimerPub includes 2767 fusion genes obtained from text mining of PubMed abstracts. ChimerSeq module is designed to archive the fusion candidates from deep sequencing data. Importantly, we have analyzed RNA-Seq data of the TCGA project covering 4569 patients in 23 cancer types using two reliable programs of FusionScan and TopHat-Fusion. The new user interface supports diverse search options and graphic representation of fusion gene structure. ChimerDB 3.0 is available at http://ercsb.ewha.ac.kr/fusiongene/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. The aquatic animals' transcriptome resource for comparative functional analysis.

    PubMed

    Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da

    2018-05-09

    Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .

  10. Identification of mycoparasitism-related genes against the phytopathogen Sclerotinia sclerotiorum through transcriptome and expression profile analysis in Trichoderma harzianum.

    PubMed

    Steindorff, Andrei Stecca; Ramada, Marcelo Henrique Soller; Coelho, Alexandre Siqueira Guedes; Miller, Robert Neil Gerard; Pappas, Georgios Joannis; Ulhoa, Cirano José; Noronha, Eliane Ferreira

    2014-03-18

    The species of T. harzianum are well known for their biocontrol activity against plant pathogens. However, few studies have been conducted to further our understanding of its role as a biological control agent against S. sclerotiorum, a pathogen involved in several crop diseases around the world. In this study, we have used RNA-seq and quantitative real-time PCR (RT-qPCR) techniques in order to explore changes in T. harzianum gene expression during growth on cell wall of S. sclerotiorum (SSCW) or glucose. RT-qPCR was also used to examine genes potentially involved in biocontrol, during confrontation between T. harzianum and S. sclerotiorum. Data obtained from six RNA-seq libraries were aligned onto the T. harzianum CBS 226.95 reference genome and compared after annotation using the Blast2GO suite. A total of 297 differentially expressed genes were found in mycelia grown for 12, 24 and 36 h under the two different conditions: supplemented with glucose or SSCW. Functional annotation of these genes identified diverse biological processes and molecular functions required during T. harzianum growth on SSCW or glucose. We identified various genes of biotechnological value encoding proteins with functions such as transporters, hydrolytic activity, adherence, appressorium development and pathogenesis. To validate the expression profile, RT-qPCR was performed using 20 randomly chosen genes. RT-qPCR expression profiles were in complete agreement with the RNA-Seq data for 17 of the genes evaluated. The other three showed differences at one or two growth times. During the confrontation assay, some genes were up-regulated during and after contact, as shown in the presence of SSCW which is commonly used as a model to mimic this interaction. The present study is the first initiative to use RNA-seq for identification of differentially expressed genes in T. harzianum strain TR274, in response to the phytopathogenic fungus S. sclerotiorum. It provides insights into the mechanisms of gene expression involved in mycoparasitism of T. harzianum against S.sclerotiorum. The RNA-seq data presented will facilitate improvement of the annotation of gene models in the draft T. harzianum genome and provide important information regarding the transcriptome during this interaction.

  11. RNA-Seq of Arabidopsis Pollen Uncovers Novel Transcription and Alternative Splicing1[C][W][OA

    PubMed Central

    Loraine, Ann E.; McCormick, Sheila; Estrada, April; Patel, Ketan; Qin, Peng

    2013-01-01

    Pollen grains of Arabidopsis (Arabidopsis thaliana) contain two haploid sperm cells enclosed in a haploid vegetative cell. Upon germination, the vegetative cell extrudes a pollen tube that carries the sperm to an ovule for fertilization. Knowing the identity, relative abundance, and splicing patterns of pollen transcripts will improve our understanding of pollen and allow investigation of tissue-specific splicing in plants. Most Arabidopsis pollen transcriptome studies have used the ATH1 microarray, which does not assay splice variants and lacks specific probe sets for many genes. To investigate the pollen transcriptome, we performed high-throughput sequencing (RNA-Seq) of Arabidopsis pollen and seedlings for comparison. Gene expression was more diverse in seedling, and genes involved in cell wall biogenesis were highly expressed in pollen. RNA-Seq detected at least 4,172 protein-coding genes expressed in pollen, including 289 assayed only by nonspecific probe sets. Additional exons and previously unannotated 5′ and 3′ untranslated regions for pollen-expressed genes were revealed. We detected regions in the genome not previously annotated as expressed; 14 were tested and 12 were confirmed by polymerase chain reaction. Gapped read alignments revealed 1,908 high-confidence new splicing events supported by 10 or more spliced read alignments. Alternative splicing patterns in pollen and seedling were highly correlated. For most alternatively spliced genes, the ratio of variants in pollen and seedling was similar, except for some encoding proteins involved in RNA splicing. This study highlights the robustness of splicing patterns in plants and the importance of ongoing annotation and visualization of RNA-Seq data using interactive tools such as Integrated Genome Browser. PMID:23590974

  12. Transcriptomic Profiles of Brain Provide Insights into Molecular Mechanism of Feed Conversion Efficiency in Crucian Carp (Carassius auratus)

    PubMed Central

    Pang, Meixia; Luo, Weiwei; Yu, Xiaomu; Zhou, Ying; Tong, Jingou

    2018-01-01

    Feed efficiency is an economically crucial trait for cultured animals, however, progress has been scarcely made in the genetic analyses of feed conversion efficiency (FCE) in fish because of the difficulties in measurement of trait phenotypes. In the present investigation, we present the first application of RNA sequencing (RNA-Seq) combined with differentially expressed genes (DEGs) analysis for identification of functional determinants related to FCE at the gene level in an aquaculture fish, crucian carp (Carassius auratus). Brain tissues of six crucian carp with extreme FCE performances were subjected to transcriptome analysis. A total of 544,612 unigenes with a mean size of 644.38 bp were obtained from Low- and High-FCE groups, and 246 DEGs that may be involved in FCE traits were identified in these two groups. qPCR confirmed that genes previously identified as up- or down-regulated by RNA-Seq were effectively up- or down-regulated under the studied conditions. Thirteen key genes, whose functions are associated with metabolism (Dgkk, Mgst3 and Guk1b), signal transduction (Vdnccsa1b, Tgfα, Nr4a1 and Tacr2) and growth (Endog, Crebrtc2, Myh7, Myh1, Myh14 and Igfbp7) were identified according to GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) annotations. Our novel findings provide useful pathway information and candidate genes for future studies of genetic mechanisms underlying FCE in crucian carp. PMID:29538345

  13. Identification of Immunity-Related Genes in Dialeurodes citri against Entomopathogenic Fungus Lecanicillium attenuatum by RNA-Seq Analysis.

    PubMed

    Yu, Shijiang; Ding, Lili; Luo, Ren; Li, Xiaojiao; Yang, Juan; Liu, Haoqiang; Cong, Lin; Ran, Chun

    2016-01-01

    Dialeurodes citri is a major pest in citrus producing areas, and large-scale outbreaks have occurred increasingly often in recent years. Lecanicillium attenuatum is an important entomopathogenic fungus that can parasitize and kill D. citri. We separated the fungus from corpses of D. citri larvae. However, the sound immune defense system of pests makes infection by an entomopathogenic fungus difficult. Here we used RNA sequencing technology (RNA-Seq) to build a transcriptome database for D. citri and performed digital gene expression profiling to screen genes that act in the immune defense of D. citri larvae infected with a pathogenic fungus. De novo assembly generated 84,733 unigenes with mean length of 772 nt. All unigenes were searched against GO, Nr, Swiss-Prot, COG, and KEGG databases and a total of 28,190 (33.3%) unigenes were annotated. We identified 129 immunity-related unigenes in transcriptome database that were related to pattern recognition receptors, information transduction factors and response factors. From the digital gene expression profile, we identified 441 unigenes that were differentially expressed in D. citri infected with L. attenuatum. Through calculated Log2Ratio values, we identified genes for which fold changes in expression were obvious, including cuticle protein, vitellogenin, cathepsin, prophenoloxidase, clip-domain serine protease, lysozyme, and others. Subsequent quantitative real-time polymerase chain reaction analysis verified the results. The identified genes may serve as target genes for microbial control of D. citri.

  14. Identification of Immunity-Related Genes in Dialeurodes citri against Entomopathogenic Fungus Lecanicillium attenuatum by RNA-Seq Analysis

    PubMed Central

    Yu, Shijiang; Ding, Lili; Luo, Ren; Li, Xiaojiao; Yang, Juan; Liu, Haoqiang; Cong, Lin; Ran, Chun

    2016-01-01

    Dialeurodes citri is a major pest in citrus producing areas, and large-scale outbreaks have occurred increasingly often in recent years. Lecanicillium attenuatum is an important entomopathogenic fungus that can parasitize and kill D. citri. We separated the fungus from corpses of D. citri larvae. However, the sound immune defense system of pests makes infection by an entomopathogenic fungus difficult. Here we used RNA sequencing technology (RNA-Seq) to build a transcriptome database for D. citri and performed digital gene expression profiling to screen genes that act in the immune defense of D. citri larvae infected with a pathogenic fungus. De novo assembly generated 84,733 unigenes with mean length of 772 nt. All unigenes were searched against GO, Nr, Swiss-Prot, COG, and KEGG databases and a total of 28,190 (33.3%) unigenes were annotated. We identified 129 immunity-related unigenes in transcriptome database that were related to pattern recognition receptors, information transduction factors and response factors. From the digital gene expression profile, we identified 441 unigenes that were differentially expressed in D. citri infected with L. attenuatum. Through calculated Log2Ratio values, we identified genes for which fold changes in expression were obvious, including cuticle protein, vitellogenin, cathepsin, prophenoloxidase, clip-domain serine protease, lysozyme, and others. Subsequent quantitative real-time polymerase chain reaction analysis verified the results. The identified genes may serve as target genes for microbial control of D. citri. PMID:27644092

  15. Transcription profile analysis of Lycopersicum esculentum leaves, unravels volatile emissions and gene expression under salinity stress.

    PubMed

    Zhang, Jihong; Zeng, Li; Chen, Shaoyang; Sun, Helong; Ma, Shuang

    2018-05-01

    Salinity stress can impede development and plant growth adversely. However, there is very little molecular information on NaCl resistance and volatile emissions in Lycopersicum esculentum. In order to investigate the effects of salt stress on the release of volatile compounds, we quantified and compared transcriptome changes by RNA-Seq analysis and volatile constituents with gas chromatography/mass spectrometry (GC/MS) coupled with solid-phase microextraction (SPME) after exposure to continuous salt stress. Chemical analysis by GC-MS analysis revealed that NaCl stress had changed species and quantity of volatile compounds released. In this research, 21,578 unigenes that represented 44,714 assembled unique transcripts were separated from tomato leaves exposed to NaCl stress based on de novo transcriptome assembly. The total number of differentially expressed genes was 7210 after exposure to NaCl, including 6200 down-regulated and 1208 up-regulated genes. Among these differentially expressed genes (DEGs), there were eighteen differentially expressed genes associated with volatile biosynthesis. Of the unigenes, 3454 were mapped to 131 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, mainly those are involved in RNA transport, plant-pathogen interactions, and plant hormone signal transduction. qRT-PCR analysis showed that NaCl exposure affected the expression profiles of the biosynthesis genes for eight volatile compounds (IPI, GPS, and TPS, etc.), which corresponded well with the RNA-Seq analysis and GC-MS results. Our results suggest that NaCl stress affects the emission of volatile substances from L. esculentum leaves by regulating the expression of genes that are involved in volatile organic compounds' biosynthesis. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  16. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas

    PubMed Central

    Hou, Yu; Guo, Huahu; Cao, Chen; Li, Xianlong; Hu, Boqiang; Zhu, Ping; Wu, Xinglong; Wen, Lu; Tang, Fuchou; Huang, Yanyi; Peng, Jirun

    2016-01-01

    Single-cell genome, DNA methylome, and transcriptome sequencing methods have been separately developed. However, to accurately analyze the mechanism by which transcriptome, genome and DNA methylome regulate each other, these omic methods need to be performed in the same single cell. Here we demonstrate a single-cell triple omics sequencing technique, scTrio-seq, that can be used to simultaneously analyze the genomic copy-number variations (CNVs), DNA methylome, and transcriptome of an individual mammalian cell. We show that large-scale CNVs cause proportional changes in RNA expression of genes within the gained or lost genomic regions, whereas these CNVs generally do not affect DNA methylation in these regions. Furthermore, we applied scTrio-seq to 25 single cancer cells derived from a human hepatocellular carcinoma tissue sample. We identified two subpopulations within these cells based on CNVs, DNA methylome, or transcriptome of individual cells. Our work offers a new avenue of dissecting the complex contribution of genomic and epigenomic heterogeneities to the transcriptomic heterogeneity within a population of cells. PMID:26902283

  17. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages

    PubMed Central

    Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles

    2014-01-01

    The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058

  18. Cross-Laboratory Analysis of Brain Cell Type Transcriptomes with Applications to Interpretation of Bulk Tissue Data

    PubMed Central

    Toker, Lilah; Rocco, Brad; Sibille, Etienne

    2017-01-01

    Establishing the molecular diversity of cell types is crucial for the study of the nervous system. We compiled a cross-laboratory database of mouse brain cell type-specific transcriptomes from 36 major cell types from across the mammalian brain using rigorously curated published data from pooled cell type microarray and single-cell RNA-sequencing (RNA-seq) studies. We used these data to identify cell type-specific marker genes, discovering a substantial number of novel markers, many of which we validated using computational and experimental approaches. We further demonstrate that summarized expression of marker gene sets (MGSs) in bulk tissue data can be used to estimate the relative cell type abundance across samples. To facilitate use of this expanding resource, we provide a user-friendly web interface at www.neuroexpresso.org. PMID:29204516

  19. A Rapid, Extensive, and Transient Transcriptional Response to Estrogen Signaling in Breast Cancer Cells

    PubMed Central

    Hah, Nasun; Danko, Charles G.; Core, Leighton; Waterfall, Joshua J.; Siepel, Adam; Lis, John T.; Kraus, W. Lee

    2011-01-01

    Summary We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using Global Run-On and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein coding genes, estrogen regulates the distribution and activity of all three RNA polymerases, and virtually every class of non-coding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems. PMID:21549415

  20. Single-cell RNA-seq of rheumatoid arthritis synovial tissue using low-cost microfluidic instrumentation.

    PubMed

    Stephenson, William; Donlin, Laura T; Butler, Andrew; Rozo, Cristina; Bracken, Bernadette; Rashidfarrokhi, Ali; Goodman, Susan M; Ivashkiv, Lionel B; Bykerk, Vivian P; Orange, Dana E; Darnell, Robert B; Swerdlow, Harold P; Satija, Rahul

    2018-02-23

    Droplet-based single-cell RNA-seq has emerged as a powerful technique for massively parallel cellular profiling. While this approach offers the exciting promise to deconvolute cellular heterogeneity in diseased tissues, the lack of cost-effective and user-friendly instrumentation has hindered widespread adoption of droplet microfluidic techniques. To address this, we developed a 3D-printed, low-cost droplet microfluidic control instrument and deploy it in a clinical environment to perform single-cell transcriptome profiling of disaggregated synovial tissue from five rheumatoid arthritis patients. We sequence 20,387 single cells revealing 13 transcriptomically distinct clusters. These encompass an unsupervised draft atlas of the autoimmune infiltrate that contribute to disease biology. Additionally, we identify previously uncharacterized fibroblast subpopulations and discern their spatial location within the synovium. We envision that this instrument will have broad utility in both research and clinical settings, enabling low-cost and routine application of microfluidic techniques.

  1. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis

    PubMed Central

    Ji, Zhicheng; Ji, Hongkai

    2016-01-01

    When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. PMID:27179027

  2. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.

    PubMed

    Ji, Zhicheng; Ji, Hongkai

    2016-07-27

    When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. RNA-sequencing of the sturgeon Acipenser baeri provides insights into expression dynamics of morphogenic differentiation and developmental regulatory genes in early versus late developmental stages.

    PubMed

    Song, Wei; Jiang, Keji; Zhang, Fengying; Lin, Yu; Ma, Lingbo

    2016-08-08

    Acipenser baeri, one of the critically endangered animals on the verge of extinction, is a key species for evolutionary, developmental, physiology and conservation studies and a standout amongst the most important food products worldwide. Though the transcriptome of the early development of A. baeri has been published recently, the transcriptome changes occurring in the transition from embryonic to late stages are still unknown. The aim of this work was to analyze the transcriptomes of embryonic and post-embryonic stages of A. baeri and identify differentially expressed genes (DEGs) and their expression patterns using mRNA collected from specimens at big yolk plug, wide neural plate and 64 day old sturgeon developmental stages for RNA-Seq. The paired-end sequencing of the transcriptome of samples of A. baeri collected at two early (big yolk plug (T1, 32 h after fertilization) and wide neural plate formation (T2, 45 h after fertilization)) and one late (T22, 64 day old sturgeon) developmental stages using Illumina Hiseq2000 platform generated 64039846, 64635214 and 75293762 clean paired-end reads for T1, T2 and T22, respectively. After quality control, the sequencing reads were de novo assembled to generate a set of 149,265 unigenes with N50 value of 1277 bp. Functional annotation indicated that a substantial number of these unigenes had significant similarity with proteins in public databases. Differential expression profiling allowed the identification of 2789, 12,819 and 10,824 DEGs from the respective T1 vs. T2, T1 vs. T22 and T2 vs. T22 comparisons. High correlation of DEGs' features was recorded among early stages while significant divergences were observed when comparing the late stage with early stages. GO and KEGG enrichment analyses revealed the biological processes, cellular component, molecular functions and metabolic pathways associated with identified DEGs. The qRT-PCR performed for candidate genes in specimens confirmed the validity of the RNA-seq data. This study presents, for the first time, an extensive overview of RNA-Seq based characterization of the early and post-embryonic developmental transcriptomes of A. baeri and provided 149,265 gene sequences that will be potentially valuable for future molecular and genetic studies in A. baeri.

  4. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data.

    PubMed

    Zararsiz, Gokmen; Goksuluk, Dincer; Klaus, Bernd; Korkmaz, Selcuk; Eldem, Vahap; Karabulut, Erdem; Ozturk, Ahmet

    2017-01-01

    RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom's precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.

  5. Transcriptome analysis of zebrafish embryos exposed to deltamethrin.

    PubMed

    Chueh, Tsung-Cheng; Hsu, Li-Sung; Kao, Chin-Ming; Hsu, Tung-Wei; Liao, Hung-Yu; Wang, Kuan-Yi; Chen, Ssu Ching

    2017-05-01

    Deltamethrin (DTM), a type II pyrethroid, is one of the most commonly used insecticides. The increased use of pyrethroid leads to potential adverse effects, particularly in sensitive populations such as children and pregnant women. None of the related studies was focused on the transcriptome responses in zebrafish embryos after treatment with DTM; therefore, RNA-seq, a high-throughput method, was performed to analyze the global expression of differential expressed genes (DEGs) in zebrafish embryos treated with DTM (40 and 80 μg/L) from fertilization to 48 h postfertilization (hpf) as compared with that in the control group (without DTM treatment). Two cDNA libraries were generated from treated embryos and one cDNA library from nontreated embryos, respectively. Over 92% of reads mapped to the reference in these three libraries. It was observed that many differential genes were expressed in comparison with embryos before and after DTM. The 20 most differentially expressed upregulated or downregulated genes were majorly involved in the signaling transduction. Validation of selected nine genes expression using qRT-PCR confirmed RNA-seq results. The transcriptome sequences were further subjected to gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, showing G-protein-coupled receptor signaling pathway and neuroactive ligand-receptor interaction, respectively, were most enriched. The data from this study contributed to a better understanding of the potential consequences of fish exposed to DTM, to an evaluation of the potential threat of DTM to fish populations in aquatic environments. © 2016 Wiley Periodicals, Inc. Environ Toxicol 32: 1548-1557, 2017. © 2016 Wiley Periodicals, Inc.

  6. Transcriptome inference and systems approaches to polypharmacology and drug discovery in herbal medicine.

    PubMed

    Li, Peng; Chen, Jianxin; Zhang, Wuxia; Fu, Bangze; Wang, Wei

    2017-01-04

    Herbal medicine is a concoction of numerous chemical ingredients, and it exhibits polypharmacological effects to act on multiple pharmacological targets, regulating different biological mechanisms and treating a variety of diseases. Thus, this complexity is impossible to deconvolute by the reductionist method of extracting one active ingredient acting on one biological target. To dissect the polypharmacological effects of herbal medicines and their underling pharmacological targets as well as their corresponding active ingredients. We propose a system-biology strategy that combines omics and bioinformatical methodologies for exploring the polypharmacology of herbal mixtures. The myocardial ischemia model was induced by Ameroid constriction of the left anterior descending coronary in Ba-Ma miniature pigs. RNA-seq analysis was utilized to find the differential genes induced by myocardial ischemia in pigs treated with formula QSKL. A transcriptome-based inference method was used to find the landmark drugs with similar mechanisms to QSKL. Gene-level analysis of RNA-seq data in QSKL-treated cases versus control animals yields 279 differential genes. Transcriptome-based inference methods identified 80 landmark drugs that covered nearly all drug classes. Then, based on the landmark drugs, 155 potential pharmacological targets and 57 indications were identified for QSKL. Our results demonstrate the power of a combined approach for exploring the pharmacological target and chemical space of herbal medicines. We hope that our method could enhance our understanding of the molecular mechanisms of herbal systems and further accelerate the exploration of the value of traditional herbal medicine systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  7. De novo Assembly of the Burying Beetle Nicrophorus orbicollis (Coleoptera: Silphidae) Transcriptome Across Developmental Stages with Identification of Key Immune Transcripts

    PubMed Central

    Won, Harim I.; Schulze, Thomas T.; Clement, Emalie J.; Watson, Gabrielle F.; Watson, Sean M.; Warner, Rosalie C.; Ramler, Elizabeth A. M.; Witte, Elias J.; Schoenbeck, Mark A.; Rauter, Claudia M.; Davis, Paul H.

    2018-01-01

    Burying beetles (Nicrophorus spp.) are among the relatively few insects that provide parental care while not belonging to the eusocial insects such as ants or bees. This behavior incurs energy costs as evidenced by immune deficits and shorter life-spans in reproducing beetles. In the absence of an assembled transcriptome, relatively little is known concerning the molecular biology of these beetles. This work details the assembly and analysis of the Nicrophorus orbicollis transcriptome at multiple developmental stages. RNA-Seq reads were obtained by next-generation sequencing and the transcriptome was assembled using the Trinity assembler. Validation of the assembly was performed by functional characterization using Gene Ontology (GO), Eukaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. Differential expression analysis highlights developmental stage-specific expression patterns, and immunity-related transcripts are discussed. The data presented provides a valuable molecular resource to aid further investigation into immunocompetence throughout this organism's sexual development. PMID:29707046

  8. Global Transcriptomic Effects of Environmentally Relevant Concentrations of the Neonicotinoids Clothianidin, Imidacloprid, and Thiamethoxam in the Brain of Honey Bees ( Apis mellifera).

    PubMed

    Christen, Verena; Schirrmann, Melanie; Frey, Juerg E; Fent, Karl

    2018-06-14

    Neonicotinoids are implicated in the decline of honey bees, but the molecular basis underlying adverse effects is poorly known. Here we describe global transcriptomic profiles in the brain of honey bee workers exposed for 48 h at one environmentally realistic and one sublethal concentration of 0.3 and 3.0 ng/bee clothianidin and imidacloprid, respectively, and 0.1 and 1.0 ng/bee thiamethoxam (1-30 ng/mL sucrose solution) by high-throughput RNA-sequencing (RNA-seq). All neonicotinoids led to significant alteration (mainly down-regulation) of gene expression, generally with a concentration-dependent effect. Among many others, genes related to metabolism and detoxification were differently expressed. Gene ontology (GO) enrichment analysis of biological processes revealed catabolic carbohydrate metabolism (regulation of enzyme activities such as amylase), lipid metabolism, and transport mechanisms as shared terms between all neonicotinoids at high concentrations. KEGG pathway analysis indicated that at least two neonicotinoids induced changes in expression of various metabolic pathways: pentose phosphate pathways, starch and sucrose metabolism, and sulfur metabolism, in which glucose 1-dehydrogenase and alpha-amylase were down-regulated and 3'(2'), 5'-bisphosphate nucleotidase was up-regulated. RT-qPCR analysis confirmed the down-regulation of major royal jelly proteins, hbg3, and cyp9e2 found by RNA-seq. Our study highlights the comparative molecular effects of neonicotinoid exposure to bees. Further studies should link these effects with physiological outcomes for a better understanding of effects of neonicotinoids.

  9. Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

    2016-03-01

    Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.

  10. A Systems Biology Methodology Combining Transcriptome and Interactome Datasets to Assess the Implications of Cytokinin Signaling for Plant Immune Networks.

    PubMed

    Kunz, Meik; Dandekar, Thomas; Naseem, Muhammad

    2017-01-01

    Cytokinins (CKs) play an important role in plant growth and development. Also, several studies highlight the modulatory implications of CKs for plant-pathogen interaction. However, the underlying mechanisms of CK mediating immune networks in plants are still not fully understood. A detailed analysis of high-throughput transcriptome (RNA-Seq and microarrays) datasets under modulated conditions of plant CKs and its mergence with cellular interactome (large-scale protein-protein interaction data) has the potential to unlock the contribution of CKs to plant defense. Here, we specifically describe a detailed systems biology methodology pertinent to the acquisition and analysis of various omics datasets that delineate the role of plant CKs in impacting immune pathways in Arabidopsis.

  11. The destiny of the resistance/susceptibility against GCRV is controlled by epigenetic mechanisms in CIK cells.

    PubMed

    Shang, Xueying; Yang, Chunrong; Wan, Quanyuan; Rao, Youliang; Su, Jianguo

    2017-07-03

    Hemorrhagic disease caused by grass carp reovirus (GCRV) has severely threatened the grass carp (Ctenopharyngodon idella) cultivation industry. It is noteworthy that the resistance against GCRV infection was reported to be inheritable, and identified at both individual and cellular levels. Therefore, this work was inspired and dedicated to unravel the molecular mechanisms of fate decision post GCRV infection in related immune cells. Foremost, the resistant and susceptible CIK (C. idella kidney) monoclonal cells were established by single cell sorting, subculturing and infection screening successively. RNA-Seq, MeDIP-Seq and small RNA-Seq were carried out with C1 (CIK cells), R2 (resistant cells) and S3 (susceptible cells) groups. It was demonstrated that genome-wide DNA methylation, mRNA and microRNA expression levels in S3 were the highest among three groups. Transcriptome analysis elucidated that pathways associated with antioxidant activity, cell proliferation regulation, apoptosis activity and energy consuming might contribute to the decision of cell fates post infection. And a series of immune-related genes were identified differentially expressed across resistant and susceptible groups, which were negatively modulated by DNA methylation or microRNAs. To conclude, this study systematically uncovered the regulatory mechanism on the resistance from epigenetic perspective and provided potential biomarkers for future studies on resistance breeding.

  12. De novo transcriptome profiling of highly purified human lymphocytes primary cells

    PubMed Central

    Bonnal, Raoul J.P.; Ranzani, Valeria; Arrigoni, Alberto; Curti, Serena; Panzeri, Ilaria; Gruarin, Paola; Abrignani, Sergio; Rossetti, Grazisa; Pagani, Massimiliano

    2015-01-01

    To help better understand the role of long noncoding RNAs in the human immune system, we recently generated a comprehensive RNA-seq data set using 63 RNA samples from 13 subsets of T (CD4+ naive, CD4+ TH1, CD4+ TH2, CD4+ TH17, CD4+ Treg, CD4+ TCM, CD4+ TEM, CD8+ TCM, CD8+ TEM, CD8+ naive) and B (B naive, B memory, B CD5+) lymphocytes. There were five biological replicates for each subset except for CD8+ TCM and B CD5+ populations that included 4 replicates. RNA-Seq data were generated by an Illumina HiScanSQ sequencer using the TruSeq v3 Cluster kit. 2.192 billion of paired-ends reads, 2×100 bp, were sequenced and after filtering a total of about 1.7 billion reads were mapped. Using different de novo transcriptome reconstruction techniques over 500 previously unknown lincRNAs were identified. The current data set could be exploited to drive the functional characterization of lincRNAs, identify novel genes and regulatory networks associated with specific cells subsets of the human immune system. PMID:26451251

  13. Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq.

    PubMed

    Jaitin, Diego Adhemar; Weiner, Assaf; Yofe, Ido; Lara-Astiaso, David; Keren-Shaul, Hadas; David, Eyal; Salame, Tomer Meir; Tanay, Amos; van Oudenaarden, Alexander; Amit, Ido

    2016-12-15

    In multicellular organisms, dedicated regulatory circuits control cell type diversity and responses. The crosstalk and redundancies within these circuits and substantial cellular heterogeneity pose a major research challenge. Here, we present CRISP-seq, an integrated method for massively parallel single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-pooled screens. We show that profiling the genomic perturbation and transcriptome in the same cell enables us to simultaneously elucidate the function of multiple factors and their interactions. We applied CRISP-seq to probe regulatory circuits of innate immunity. By sampling tens of thousands of perturbed cells in vitro and in mice, we identified interactions and redundancies between developmental and signaling-dependent factors. These include opposing effects of Cebpb and Irf8 in regulating the monocyte/macrophage versus dendritic cell lineages and differential functions for Rela and Stat1/2 in monocyte versus dendritic cell responses to pathogens. This study establishes CRISP-seq as a broadly applicable, comprehensive, and unbiased approach for elucidating mammalian regulatory circuits. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Transcriptome profile of Trichoderma harzianum IOC-3844 induced by sugarcane bagasse.

    PubMed

    Horta, Maria Augusta Crivelente; Vicentini, Renato; Delabona, Priscila da Silva; Laborda, Prianda; Crucello, Aline; Freitas, Sindélia; Kuroshu, Reginaldo Massanobu; Polikarpov, Igor; Pradella, José Geraldo da Cruz; Souza, Anete Pereira

    2014-01-01

    Profiling the transcriptome that underlies biomass degradation by the fungus Trichoderma harzianum allows the identification of gene sequences with potential application in enzymatic hydrolysis processing. In the present study, the transcriptome of T. harzianum IOC-3844 was analyzed using RNA-seq technology. The sequencing generated 14.7 Gbp for downstream analyses. De novo assembly resulted in 32,396 contigs, which were submitted for identification and classified according to their identities. This analysis allowed us to define a principal set of T. harzianum genes that are involved in the degradation of cellulose and hemicellulose and the accessory genes that are involved in the depolymerization of biomass. An additional analysis of expression levels identified a set of carbohydrate-active enzymes that are upregulated under different conditions. The present study provides valuable information for future studies on biomass degradation and contributes to a better understanding of the role of the genes that are involved in this process.

  15. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less

  16. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090

  17. A nonlethal sampling method to obtain, generate and assemble whole blood transcriptomes from small, wild mammals.

    PubMed

    Huang, Zixia; Gallot, Aurore; Lao, Nga T; Puechmaille, Sébastien J; Foley, Nicole M; Jebb, David; Bekaert, Michaël; Teeling, Emma C

    2016-01-01

    The acquisition of tissue samples from wild populations is a constant challenge in conservation biology, especially for endangered species and protected species where nonlethal sampling is the only option. Whole blood has been suggested as a nonlethal sample type that contains a high percentage of bodywide and genomewide transcripts and therefore can be used to assess the transcriptional status of an individual, and to infer a high percentage of the genome. However, only limited quantities of blood can be nonlethally sampled from small species and it is not known if enough genetic material is contained in only a few drops of blood, which represents the upper limit of sample collection for some small species. In this study, we developed a nonlethal sampling method, the laboratory protocols and a bioinformatic pipeline to sequence and assemble the whole blood transcriptome, using Illumina RNA-Seq, from wild greater mouse-eared bats (Myotis myotis). For optimal results, both ribosomal and globin RNAs must be removed before library construction. Treatment of DNase is recommended but not required enabling the use of smaller amounts of starting RNA. A large proportion of protein-coding genes (61%) in the genome were expressed in the blood transcriptome, comparable to brain (65%), kidney (63%) and liver (58%) transcriptomes, and up to 99% of the mitogenome (excluding D-loop) was recovered in the RNA-Seq data. In conclusion, this nonlethal blood sampling method provides an opportunity for a genomewide transcriptomic study of small, endangered or critically protected species, without sacrificing any individuals. © 2015 John Wiley & Sons Ltd.

  18. High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome

    PubMed Central

    Tang, Xiaoqing; Xiao, Yunhua; Lv, Tingting; Wang, Fangquan; Zhu, QianHao; Zheng, Tianqing; Yang, Jie

    2014-01-01

    Background Isatis indigotica, the source of the traditional Chinese medicine Radix isatidis (Ban-Lan-Gen), is an extremely important economical crop in China. To facilitate biological, biochemical and molecular research on the medicinal chemicals in I. indigotica, here we report the first I. indigotica transcriptome generated by RNA sequencing (RNA-seq). Results RNA-seq library was created using RNA extracted from a mixed sample including leaf and root. A total of 33,238 unigenes were assembled from more than 28 million of high quality short reads. The quality of the assembly was experimentally examined by cDNA sequencing of seven randomly selected unigenes. Based on blast search 28,184 unigenes had a hit in at least one of the protein and nucleotide databases used in this study, and 8 unigenes were found to be associated with biosynthesis of indole and its derivatives. According to Gene Ontology classification, 22,365 unigenes were categorized into 48 functional groups. Furthermore, Clusters of Orthologous Group and Swiss-Port annotation were assigned for 7,707 and 18,679 unigenes, respectively. Analysis of repeat motifs identified 6,400 simple sequence repeat markers in 4,509 unigenes. Conclusion Our data provide a comprehensive sequence resource for molecular study of I. indigotica. Our results will facilitate studies on the functions of genes involved in the indole alkaloid biosynthesis pathway and on metabolism of nitrogen and indole alkaloids in I. indigotica and its related species. PMID:25259890

  19. Marker-Assisted Molecular Profiling, Deletion Mutant Analysis, and RNA-Seq Reveal a Disease Resistance Cluster Associated with Uromyces appendiculatus Infection in Common Bean Phaseolus vulgaris L.

    PubMed

    Todd, Antonette R; Donofrio, Nicole; Sripathi, Venkateswara R; McClean, Phillip E; Lee, Rian K; Pastor-Corrales, Marcial; Kalavacharla, Venu Kal

    2017-05-23

    Common bean ( Phaseolus vulgaris L.) is an important legume, useful for its high protein and dietary fiber. The fungal pathogen Uromyces appendiculatus (Pers.) Unger can cause major loss in susceptible varieties of the common bean. The Ur-3 locus provides race specific resistance to virulent strains or races of the bean rust pathogen along with Crg , (Complements resistance gene), which is required for Ur-3 -mediated rust resistance. In this study, we inoculated two common bean genotypes (resistant "Sierra" and susceptible crg) with rust race 53 of U. appendiculatus , isolated leaf RNA at specific time points, and sequenced their transcriptomes. First, molecular markers were used to locate and identify a 250 kb deletion on chromosome 10 in mutant crg (which carries a deletion at the Crg locus). Next, we identified differential expression of several disease resistance genes between Mock Inoculated (MI) and Inoculated (I) samples of "Sierra" leaf RNA within the 250 kb delineated region. Both marker assisted molecular profiling and RNA-seq were used to identify possible transcriptomic locations of interest regarding the resistance in the common bean to race 53. Identification of differential expression among samples in disease resistance clusters in the bean genome may elucidate significant genes underlying rust resistance. Along with preserving favorable traits in the crop, the current research may also aid in global sustainability of food stocks necessary for many populations.

  20. Marker-Assisted Molecular Profiling, Deletion Mutant Analysis, and RNA-Seq Reveal a Disease Resistance Cluster Associated with Uromyces appendiculatus Infection in Common Bean Phaseolus vulgaris L.

    PubMed Central

    Todd, Antonette R.; Donofrio, Nicole; Sripathi, Venkateswara R.; McClean, Phillip E.; Lee, Rian K.; Pastor-Corrales, Marcial; Kalavacharla, Venu (Kal)

    2017-01-01

    Common bean (Phaseolus vulgaris L.) is an important legume, useful for its high protein and dietary fiber. The fungal pathogen Uromyces appendiculatus (Pers.) Unger can cause major loss in susceptible varieties of the common bean. The Ur-3 locus provides race specific resistance to virulent strains or races of the bean rust pathogen along with Crg, (Complements resistance gene), which is required for Ur-3-mediated rust resistance. In this study, we inoculated two common bean genotypes (resistant “Sierra” and susceptible crg) with rust race 53 of U. appendiculatus, isolated leaf RNA at specific time points, and sequenced their transcriptomes. First, molecular markers were used to locate and identify a 250 kb deletion on chromosome 10 in mutant crg (which carries a deletion at the Crg locus). Next, we identified differential expression of several disease resistance genes between Mock Inoculated (MI) and Inoculated (I) samples of “Sierra” leaf RNA within the 250 kb delineated region. Both marker assisted molecular profiling and RNA-seq were used to identify possible transcriptomic locations of interest regarding the resistance in the common bean to race 53. Identification of differential expression among samples in disease resistance clusters in the bean genome may elucidate significant genes underlying rust resistance. Along with preserving favorable traits in the crop, the current research may also aid in global sustainability of food stocks necessary for many populations. PMID:28545258

  1. Transcriptomic Analysis of Vulvovaginal Candidiasis Identifies a Role for the NLRP3 Inflammasome

    PubMed Central

    Shetty, Amol C.; Yano, Junko; Fidel, Paul L.; Noverr, Mairi C.

    2015-01-01

    ABSTRACT Treatment of vulvovaginal candidiasis (VVC), caused most frequently by Candida albicans, represents a significant unmet clinical need. C. albicans, as both a commensal and a pathogenic organism, has a complex and poorly understood interaction with the vaginal environment. Understanding the complex nature of this relationship is necessary for the development of desperately needed therapies to treat symptomatic infection. Using transcriptome sequencing (RNA-seq), we characterized the early murine vaginal and fungal transcriptomes of the organism during VVC. Network analysis of host genes that were differentially expressed between infected and naive mice predicted the activation or repression of several signaling pathways that have not been previously associated with VVC, including NLRP3 inflammasome activation. Intravaginal challenge of Nlrp3−/− mice with C. albicans demonstrated severely reduced levels of polymorphonuclear leukocytes (PMNs), alarmins, and inflammatory cytokines, including interleukin-1β (IL-1β) (the hallmarks of VVC immunopathogenesis) in vaginal lavage fluid. Intravaginal administration of wild-type (WT) mice with glyburide, a potent inhibitor of the NLRP3 inflammasome, reduced PMN infiltration and IL-1β to levels comparable to those observed in Nlrp3−/− mice. Furthermore, RNA-seq analysis of C. albicans genes indicated robust expression of hypha-associated secreted aspartyl proteinases 4, 5, and 6 (SAP4–6), which are known inflammasome activators. Despite colonization similar to that of the WT strain, ΔSAP4–6 triple and ΔSAP5 single mutants induced significantly less PMN influx and IL-1β during intravaginal challenge. Our findings demonstrate a novel role for the inflammasome in the immunopathogenesis of VVC and implicate the hypha-associated SAPs as major C. albicans virulence determinants during vulvovaginal candidiasis. PMID:25900651

  2. EUCANEXT: an integrated database for the exploration of genomic and transcriptomic data from Eucalyptus species

    PubMed Central

    Nascimento, Leandro Costa; Salazar, Marcela Mendes; Lepikson-Neto, Jorge; Camargo, Eduardo Leal Oliveira; Parreiras, Lucas Salera; Carazzolle, Marcelo Falsarella

    2017-01-01

    Abstract Tree species of the genus Eucalyptus are the most valuable and widely planted hardwoods in the world. Given the economic importance of Eucalyptus trees, much effort has been made towards the generation of specimens with superior forestry properties that can deliver high-quality feedstocks, customized to the industrýs needs for both cellulosic (paper) and lignocellulosic biomass production. In line with these efforts, large sets of molecular data have been generated by several scientific groups, providing invaluable information that can be applied in the development of improved specimens. In order to fully explore the potential of available datasets, the development of a public database that provides integrated access to genomic and transcriptomic data from Eucalyptus is needed. EUCANEXT is a database that analyses and integrates publicly available Eucalyptus molecular data, such as the E. grandis genome assembly and predicted genes, ESTs from several species and digital gene expression from 26 RNA-Seq libraries. The database has been implemented in a Fedora Linux machine running MySQL and Apache, while Perl CGI was used for the web interfaces. EUCANEXT provides a user-friendly web interface for easy access and analysis of publicly available molecular data from Eucalyptus species. This integrated database allows for complex searches by gene name, keyword or sequence similarity and is publicly accessible at http://www.lge.ibi.unicamp.br/eucalyptusdb. Through EUCANEXT, users can perform complex analysis to identify genes related traits of interest using RNA-Seq libraries and tools for differential expression analysis. Moreover, all the bioinformatics pipeline here described, including the database schema and PERL scripts, are readily available and can be applied to any genomic and transcriptomic project, regardless of the organism. Database URL: http://www.lge.ibi.unicamp.br/eucalyptusdb PMID:29220468

  3. The transcriptome of the developing grain: a resource for understanding seed development and the molecular control of the functional and nutritional properties of wheat.

    PubMed

    Rangan, Parimalan; Furtado, Agnelo; Henry, Robert J

    2017-10-11

    Wheat is one of the three major cereals that have been domesticated to feed human populations. The composition of the wheat grain determines the functional properties of wheat including milling efficiency, bread making, and nutritional value. Transcriptome analysis of the developing wheat grain provides key insights into the molecular basis for grain development and quality. The transcriptome of 35 genotypes was analysed by RNA-Seq at two development stages (14 and 30 days-post-anthesis, dpa) corresponding to the mid stage of development (stage Z75) and the almost mature seed (stage Z85). At 14dpa, most of the transcripts were associated with the synthesis of the major seed components including storage proteins and starch. At 30dpa, a diverse range of genes were expressed at low levels with a predominance of genes associated with seed defence and stress tolerance. RNA-Seq analysis of changes in expression between 14dpa and 30dpa stages revealed 26,477 transcripts that were significantly differentially expressed at a FDR corrected p-value cut-off at ≤0.01. Functional annotation and gene ontology mapping was performed and KEGG pathway mapping allowed grouping based upon biochemical linkages. This analysis demonstrated that photosynthesis associated with the pericarp was very active at 14dpa but had ceased by 30dpa. Recently reported genes for flour yield in milling and bread quality were found to influence wheat quality largely due to expression patterns at the earlier seed development stage. This study serves as a resource providing an overview of gene expression during wheat grain development at the early (14dpa) and late (30dpa) grain filling stages for use in studies of grain quality and nutritional value and in understanding seed biology.

  4. RNAbrowse: RNA-Seq de novo assembly results browser.

    PubMed

    Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe

    2014-01-01

    Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse.

  5. Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress.

    PubMed

    Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun

    2014-06-01

    Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Comparative analysis of response to selection with three insecticides in the dengue mosquito Aedes aegypti using mRNA sequencing

    PubMed Central

    2014-01-01

    Background Mosquito control programmes using chemical insecticides are increasingly threatened by the development of resistance. Such resistance can be the consequence of changes in proteins targeted by insecticides (target site mediated resistance), increased insecticide biodegradation (metabolic resistance), altered transport, sequestration or other mechanisms. As opposed to target site resistance, other mechanisms are far from being fully understood. Indeed, insecticide selection often affects a large number of genes and various biological processes can hypothetically confer resistance. In this context, the aim of the present study was to use RNA sequencing (RNA-seq) for comparing transcription level and polymorphism variations associated with adaptation to chemical insecticides in the mosquito Aedes aegypti. Biological materials consisted of a parental susceptible strain together with three child strains selected across multiple generations with three insecticides from different classes: the pyrethroid permethrin, the neonicotinoid imidacloprid and the carbamate propoxur. Results After ten generations, insecticide-selected strains showed elevated resistance levels to the insecticides used for selection. RNA-seq data allowed detecting over 13,000 transcripts, of which 413 were differentially transcribed in insecticide-selected strains as compared to the susceptible strain. Among them, a significant enrichment of transcripts encoding cuticle proteins, transporters and enzymes was observed. Polymorphism analysis revealed over 2500 SNPs showing > 50% allele frequency variations in insecticide-selected strains as compared to the susceptible strain, affecting over 1000 transcripts. Comparing gene transcription and polymorphism patterns revealed marked differences among strains. While imidacloprid selection was linked to the over transcription of many genes, permethrin selection was rather linked to polymorphism variations. Focusing on detoxification enzymes revealed that permethrin selection strongly affected the polymorphism of several transcripts encoding cytochrome P450 monooxygenases likely involved in insecticide biodegradation. Conclusions The present study confirmed the power of RNA-seq for identifying concomitantly quantitative and qualitative transcriptome changes associated with insecticide resistance in mosquitoes. Our results suggest that transcriptome modifications can be selected rapidly by insecticides and affect multiple biological functions. Previously neglected by molecular screenings, polymorphism variations of detoxification enzymes may play an important role in the adaptive response of mosquitoes to insecticides. PMID:24593293

  7. Comparative analysis of response to selection with three insecticides in the dengue mosquito Aedes aegypti using mRNA sequencing.

    PubMed

    David, Jean-Philippe; Faucon, Frédéric; Chandor-Proust, Alexia; Poupardin, Rodolphe; Riaz, Muhammad Asam; Bonin, Aurélie; Navratil, Vincent; Reynaud, Stéphane

    2014-03-05

    Mosquito control programmes using chemical insecticides are increasingly threatened by the development of resistance. Such resistance can be the consequence of changes in proteins targeted by insecticides (target site mediated resistance), increased insecticide biodegradation (metabolic resistance), altered transport, sequestration or other mechanisms. As opposed to target site resistance, other mechanisms are far from being fully understood. Indeed, insecticide selection often affects a large number of genes and various biological processes can hypothetically confer resistance. In this context, the aim of the present study was to use RNA sequencing (RNA-seq) for comparing transcription level and polymorphism variations associated with adaptation to chemical insecticides in the mosquito Aedes aegypti. Biological materials consisted of a parental susceptible strain together with three child strains selected across multiple generations with three insecticides from different classes: the pyrethroid permethrin, the neonicotinoid imidacloprid and the carbamate propoxur. After ten generations, insecticide-selected strains showed elevated resistance levels to the insecticides used for selection. RNA-seq data allowed detecting over 13,000 transcripts, of which 413 were differentially transcribed in insecticide-selected strains as compared to the susceptible strain. Among them, a significant enrichment of transcripts encoding cuticle proteins, transporters and enzymes was observed. Polymorphism analysis revealed over 2500 SNPs showing > 50% allele frequency variations in insecticide-selected strains as compared to the susceptible strain, affecting over 1000 transcripts. Comparing gene transcription and polymorphism patterns revealed marked differences among strains. While imidacloprid selection was linked to the over transcription of many genes, permethrin selection was rather linked to polymorphism variations. Focusing on detoxification enzymes revealed that permethrin selection strongly affected the polymorphism of several transcripts encoding cytochrome P450 monooxygenases likely involved in insecticide biodegradation. The present study confirmed the power of RNA-seq for identifying concomitantly quantitative and qualitative transcriptome changes associated with insecticide resistance in mosquitoes. Our results suggest that transcriptome modifications can be selected rapidly by insecticides and affect multiple biological functions. Previously neglected by molecular screenings, polymorphism variations of detoxification enzymes may play an important role in the adaptive response of mosquitoes to insecticides.

  8. Global Transcriptional Start Site Mapping Using Differential RNA Sequencing Reveals Novel Antisense RNAs in Escherichia coli

    PubMed Central

    Thomason, Maureen K.; Bischler, Thorsten; Eisenbart, Sara K.; Förstner, Konrad U.; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay

    2014-01-01

    While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. PMID:25266388

  9. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq

    PubMed Central

    Ramsköld, Daniel; Deng, Qiaolin; Johnsson, Per; Michaëlsson, Jakob; Frisén, Jonas; Sandberg, Rickard

    2016-01-01

    Cellular heterogeneity can emerge from the expression of only one parental allele. However, it has remained controversial whether, or to what degree, random monoallelic expression of autosomal genes (aRME) is mitotically inherited (clonal) or stochastic (dynamic) in somatic cells, particularly in vivo. Here, we used allele-sensitive single-cell RNA-seq on clonal primary mouse fibroblasts and in vivo human CD8+ T-cells to dissect clonal and dynamic monoallelic expression patterns. Dynamic aRME affected a considerable portion of the cells’ transcriptomes, with levels dependent on the cells’ transcriptional activity. Importantly, clonal aRME was detected but was surprisingly scarce (<1% of genes) and affected mainly the most low-expressed genes. Consequently, the overwhelming portion of aRME occurs transiently within individual cells and patterns of aRME are thus primarily scattered throughout somatic cell populations rather than, as previously hypothesized, confined to patches of clonally related cells. PMID:27668657

  10. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq.

    PubMed

    Reinius, Björn; Mold, Jeff E; Ramsköld, Daniel; Deng, Qiaolin; Johnsson, Per; Michaëlsson, Jakob; Frisén, Jonas; Sandberg, Rickard

    2016-11-01

    Cellular heterogeneity can emerge from the expression of only one parental allele. However, it has remained controversial whether, or to what degree, random monoallelic expression of autosomal genes (aRME) is mitotically inherited (clonal) or stochastic (dynamic) in somatic cells, particularly in vivo. Here we used allele-sensitive single-cell RNA-seq on clonal primary mouse fibroblasts and freshly isolated human CD8 + T cells to dissect clonal and dynamic monoallelic expression patterns. Dynamic aRME affected a considerable portion of the cells' transcriptomes, with levels dependent on the cells' transcriptional activity. Notably, clonal aRME was detected, but it was surprisingly scarce (<1% of genes) and mainly affected the most weakly expressed genes. Consequently, the overwhelming majority of aRME occurs transiently within individual cells, and patterns of aRME are thus primarily scattered throughout somatic cell populations rather than, as previously hypothesized, confined to patches of clonally related cells.

  11. De Novo Transcriptome Sequence Assembly from Coconut Leaves and Seeds with a Focus on Factors Involved in RNA-Directed DNA Methylation

    PubMed Central

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L.; Chang, Bill Chia-Han; Matzke, Antonius J. M.; Matzke, Marjori

    2014-01-01

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. PMID:25193496

  12. De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

    PubMed

    Huang, Ya-Yi; Lee, Chueh-Pai; Fu, Jason L; Chang, Bill Chia-Han; Matzke, Antonius J M; Matzke, Marjori

    2014-09-04

    Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop. Copyright © 2014 Huang et al.

  13. Liver Transcriptome Analysis of the Large Yellow Croaker (Larimichthys crocea) during Fasting by Using RNA-Seq

    PubMed Central

    Qian, Baoying; Xue, Liangyi; Huang, Hongli

    2016-01-01

    The large yellow croaker (Larimichthys crocea) is an economically important fish species in Chinese mariculture industry. To understand the molecular basis underlying the response to fasting, Illumina HiSeqTM 2000 was used to analyze the liver transcriptome of fasting large yellow croakers. A total of 54,933,550 clean reads were obtained and assembled into 110,364 contigs. Annotation to the NCBI database identified a total of 38,728 unigenes, of which 19,654 were classified into Gene Ontology and 22,683 were found in Kyoto Encyclopedia of Genes and Genomes (KEGG). Comparative analysis of the expression profiles between fasting fish and normal-feeding fish identified a total of 7,623 differentially expressed genes (P < 0.05), including 2,500 upregulated genes and 5,123 downregulated genes. Dramatic differences were observed in the genes involved in metabolic pathways such as fat digestion and absorption, citrate cycle, and glycolysis/gluconeogenesis, and the similar results were also found in the transcriptome of skeletal muscle. Further qPCR analysis confirmed that the genes encoding the factors involved in those pathways significantly changed in terms of expression levels. The results of the present study provide insights into the molecular mechanisms underlying the metabolic response of the large yellow croaker to fasting as well as identified areas that require further investigation. PMID:26967898

  14. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.

    PubMed

    Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie

    2018-01-01

    Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.

  15. Integrative analysis of Arabidopsis thaliana transcriptomics reveals intuitive splicing mechanism for circular RNA.

    PubMed

    Sun, Xiaoyong; Wang, Lin; Ding, Jiechao; Wang, Yanru; Wang, Jiansheng; Zhang, Xiaoyang; Che, Yulei; Liu, Ziwei; Zhang, Xinran; Ye, Jiazhen; Wang, Jie; Sablok, Gaurav; Deng, Zhiping; Zhao, Hongwei

    2016-10-01

    A new regulatory class of small endogenous RNAs called circular RNAs (circRNAs) has been described as miRNA sponges in animals. Using 16 Arabidopsis thaliana RNA-Seq data sets, we identified 803 circRNAs in RNase R-/non-RNase R-treated samples. The results revealed the following features: Canonical and noncanonical splicing can generate circRNAs; chloroplasts are a hotspot for circRNA generation; furthermore, limited complementary sequences exist not only in introns, but also in the sequences flanking splice sites. The latter finding suggests that multiple combinations between complementary sequences may facilitate the formation of the circular structure. Our results contribute to a better understanding of this novel class of plant circRNAs. © 2016 Federation of European Biochemical Societies.

  16. Transcriptomic analysis of differentially expressed genes in flower-buds of genetic male sterile and wild type cucumber by RNA sequencing.

    PubMed

    Han, Yike; Wang, Xianyun; Zhao, Fengyue; Gao, Shang; Wei, Aimin; Chen, Zhengwu; Liu, Nan; Zhang, Zhenxian; Du, Shengli

    2018-05-01

    Cucumber ( Cucumis sativus L. ) pollen development involves a diverse range of gene interactions between sporophytic and gametophytic tissues. Previous studies in our laboratory showed that male sterility was controlled by a single recessive nuclear gene, and occurred in pollen mother cell meiophase. To fully explore the global gene expression and identify genes related to male sterility, a RNA-seq analysis was adopted in this study. Young male flower-buds (1-2 mm in length) from genetic male sterility (GMS) mutant and homozygous fertile cucumber (WT) were collected for two sequencing libraries. Total 545 differentially expressed genes (DEGs), including 142 up-regulated DEGs and 403 down-regulated DEGs, were detected in two libraries (Fold Change ≥ 2, FDR < 0.01). These genes were involved in a variety of metabolic pathways, like ethylene-activated signaling pathway, sporopollenin biosynthetic pathway, cell cycle and DNA damage repair pathway. qRT-PCR analysis was performed and showed that the correlation between RNA-Seq and qRT-PCR was 0.876. These findings contribute to a better understanding of the mechanism that leads to GMS in cucumber.

  17. Transcriptomic analysis on the formation of the viable putative non-culturable state of beer-spoilage Lactobacillus acetotolerans

    PubMed Central

    Liu, Junyan; Deng, Yang; Peters, Brian M.; Li, Lin; Li, Bing; Chen, Lequn; Xu, Zhenbo; Shirtliff, Mark E.

    2016-01-01

    Lactic acid bacteria (LAB) are the most common beer-spoilage bacteria regardless of beer type, and thus pose significant problems for the brewery industry. The aim of this study was to investigate the genetic mechanisms involved in the ability of the hard-to-culture beer-spoilage bacterium Lactobacillus acetotolerans to enter into the viable putative non-culturable (VPNC) state. A genome-wide transcriptional analysis of beer-spoilage L. acetotolerans strains BM-LA14526, BM-LA14527, and BM-LA14528 under normal, mid-term and VPNC states were performed using RNA-sequencing (RNA-seq) and further bioinformatics analyses. GO function, COG category, and KEGG pathway enrichment analysis were conducted to investigate functional and related metabolic pathways of the differentially expressed genes. Functional and pathway enrichment analysis indicated that heightened stress response and reduction in genes associated with transport, metabolic process, and enzyme activity might play important roles in the formation of the VPNC state. This is the first transcriptomic analysis on the formation of the VPNC state of beer spoilage L. acetotolerans. PMID:27819317

  18. Transcriptomic analysis on the formation of the viable putative non-culturable state of beer-spoilage Lactobacillus acetotolerans.

    PubMed

    Liu, Junyan; Deng, Yang; Peters, Brian M; Li, Lin; Li, Bing; Chen, Lequn; Xu, Zhenbo; Shirtliff, Mark E

    2016-11-07

    Lactic acid bacteria (LAB) are the most common beer-spoilage bacteria regardless of beer type, and thus pose significant problems for the brewery industry. The aim of this study was to investigate the genetic mechanisms involved in the ability of the hard-to-culture beer-spoilage bacterium Lactobacillus acetotolerans to enter into the viable putative non-culturable (VPNC) state. A genome-wide transcriptional analysis of beer-spoilage L. acetotolerans strains BM-LA14526, BM-LA14527, and BM-LA14528 under normal, mid-term and VPNC states were performed using RNA-sequencing (RNA-seq) and further bioinformatics analyses. GO function, COG category, and KEGG pathway enrichment analysis were conducted to investigate functional and related metabolic pathways of the differentially expressed genes. Functional and pathway enrichment analysis indicated that heightened stress response and reduction in genes associated with transport, metabolic process, and enzyme activity might play important roles in the formation of the VPNC state. This is the first transcriptomic analysis on the formation of the VPNC state of beer spoilage L. acetotolerans.

  19. De novo RNA-Seq based transcriptome analysis of Papiliotrema laurentii strain RY1 under nitrogen starvation.

    PubMed

    Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan

    2018-03-01

    Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Integrative analyses of RNA editing, alternative splicing, and expression of young genes in human brain transcriptome by deep RNA sequencing.

    PubMed

    Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping

    2015-08-01

    Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution. © The Author (2015). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, IBCB, SIBS, CAS. All rights reserved.

  1. Getting the most out of RNA-seq data analysis.

    PubMed

    Khang, Tsung Fei; Lau, Ching Yee

    2015-01-01

    Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.

  2. Transcriptomic analysis reveals vacuolar Na+ (K+)/H+ antiporter gene contributing to growth, development, and defense in switchgrass (Panicum virgatum L.).

    PubMed

    Huang, Yanhua; Cui, Xin; Cen, Huifang; Wang, Kehua; Zhang, Yunwei

    2018-04-10

    Intracellular Na + (K + )/H + antiporters (NHXs) have pivotal functions in regulating plant growth, development, and resistance to a range of stresses. To gain insight into the molecular events underlying their actions in switchgrass (Panicum virgatum L.), we analyzed transcriptomic changes between PvNHX1-overexpression transgenic lines and wild-type (WT) plants using RNA sequencing (RNA-seq) technology. The comparison of transcriptomic data from the WT and transgenic plants revealed a large number of differentially expressed genes (DEGs) in the latter. Gene ontology (GO) and KEGG pathway analyses showed that these DEGs were associated with a wide range of functions, and participated in many biological processes. For example, we found that PvNHX1 had an important role in plant growth through its regulation of photosynthetic activity and cell expansion. In addition, PvNHX1 regulated K + homeostasis, cell expansion and pollen development, indicating that it has unique and specific roles in flower development. We also found that transgenic switchgrass exhibited a higher level of transcription of defense-related genes, especially those involved in disease resistance. We showed that PvNHX1 had an important role in plant growth and development through its regulation of photosynthetic activity, cell expansion, K + homeostasis, and pollen development. Additionally, PvNHX1 overexpression activated a complex signal transduction network in response to various biotic and abiotic stresses. In relation to plant growth, development, and defense responses, PvNHX1 also had a vital regulatory role in the formation of a series of plant hormones and transcription factors (TFs). The reliability of the RNA-seq data was confirmed by quantitative real-time PCR. Our data provide a valuable foundation for further research into the molecular mechanisms and physiological roles of NHXs in plants.

  3. Whole-Transcriptome Shotgun Sequencing (RNA-seq) Screen Reveals Upregulation of Cellobiose and Motility Operons of Lactobacillus ruminis L5 during Growth on Tetrasaccharides Derived from Barley β-Glucan

    PubMed Central

    Lawley, Blair; Sims, Ian M.

    2013-01-01

    Lactobacillus ruminis is an inhabitant of human bowels and bovine rumens. None of 10 isolates (three from bovine rumen, seven from human feces) of L. ruminis that were tested could utilize barley β-glucan for growth. Seven of the strains of L. ruminis were, however, able to utilize tetrasaccharides (3-O-β-cellotriosyl-d-glucose [LDP4] or 4-O-β-laminaribiosyl-d-cellobiose [CDP4]) present in β-glucan hydrolysates for growth. The tetrasaccharides were generated by the use of lichenase or cellulase, respectively. To learn more about the utilization of tetrasaccharides by L. ruminis, whole-transcriptome shotgun sequencing (RNA-seq) was tested as a transcriptional screen to detect altered gene expression when an autochthonous human strain (L5) was grown in medium containing CDP4. RNA-seq results were confirmed and extended by reverse transcription-quantitative PCR assays of selected genes in two upregulated operons when cells were grown as batch cultures in medium containing either CDP4 or LDP4. The cellobiose utilization operon had increased transcription, particularly in early growth phase, whereas the chemotaxis/motility operon was upregulated in late growth phase. Phenotypic changes were seen in relation to upregulation of chemotaxis/flagellar operons: flagella were rarely seen by electron microscopy on glucose-grown cells but cells cultured in tetrasaccharide medium were commonly flagellated. Chemotactic movement toward tetrasaccharides was demonstrated in capillary cultures. L. ruminis utilized 3-O-β-cellotriosyl-d-glucose released by β-glucan hydrolysis due to bowel commensal Coprococcus sp., indicating that cross feeding of tetrasaccharide between bacteria could occur. Therefore, the RNA-seq screen and subsequent experiments had utility in revealing foraging attributes of gut commensal Lactobacillus ruminis. PMID:23851085

  4. Transcriptome analysis reveals potential mechanisms underlying differential heart development in fast- and slow-growing broilers under heat stress.

    PubMed

    Zhang, Jibin; Schmidt, Carl J; Lamont, Susan J

    2017-04-13

    Modern fast-growing broilers are susceptible to heart failure under heat stress because their relatively small hearts cannot meet increased need of blood pumping. To improve the cardiac tolerance to heat stress in modern broilers through breeding, we need to find the important genes and pathways that contribute to imbalanced cardiac development and frequent occurrence of heat-related heart dysfunction. Two broiler lines - Ross 708 and Illinois - were included in this study as a fast-growing model and a slow-growing model respectively. Each broiler line was separated to two groups at 21 days posthatch. One group was subjected to heat stress treatment in the range of 35-37 °C for 8 h per day, and the other was kept in thermoneutral condition. Body and heart weights were measured at 42 days posthatch, and gene expression in left ventricles were compared between treatments and broiler lines through RNA-seq analysis. Body weight and normalized heart weight were significantly reduced by heat stress only in Ross broilers. RNA-seq results of 44 genes were validated using Biomark assay. A total of 325 differentially expressed (DE) genes were detected between heat stress and thermoneutral in Ross 708 birds, but only 3 in Illinois broilers. Ingenuity pathway analysis (IPA) predicted dramatic changes in multiple cellular activities especially downregulation of cell cycle. Comparison between two lines showed that cell cycle activity is higher in Ross than Illinois in thermoneutral condition but is decreased under heat stress. Among the significant pathways (P < 0.01) listed for different comparisons, "Mitotic Roles of Polo-like Kinases" is always ranked first. The increased susceptibility of modern broilers to cardiac dysfunction under heat stress compared to slow-growing broilers could be due to diminished heart capacity related to reduction in relative heart size. The smaller relative heart size in Ross heat stress group than in Ross thermoneutral group is suggested by the transcriptome analysis to be caused by decreased cell cycle activity and increased apoptosis. The DE genes in RNA-seq analysis and significant pathways in IPA provides potential targets for breeding of heat-tolerant broilers with optimized heart function.

  5. GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

    PubMed Central

    Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.

    2011-01-01

    GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647

  6. De novo assembly and annotation of the Antarctic copepod (Tigriopus kingsejongensis) transcriptome.

    PubMed

    Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong

    2016-08-01

    The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. RNA-Seq analysis reveals new evidence for inflammation-related changes in aged kidney

    PubMed Central

    Park, Daeui; Kim, Byoung-Chul; Kim, Chul-Hong; Choi, Yeon Ja; Jeong, Hyoung Oh; Kim, Mi Eun; Lee, Jun Sik; Park, Min Hi; Chung, Ki Wung; Kim, Dae Hyun; Lee, Jaewon; Im, Dong-Soon; Yoon, Seokjoo; Lee, Sunghoon; Yu, Byung Pal; Bhak, Jong; Chung, Hae Young

    2016-01-01

    Age-related dysregulated inflammation plays an essential role as a major risk factor underlying the pathophysiological aging process. To better understand how inflammatory processes are related to aging at the molecular level, we sequenced the transcriptome of young and aged rat kidney using RNA-Seq to detect known genes, novel genes, and alternative splicing events that are differentially expressed. By comparing young (6 months of age) and old (25 months of age) rats, we detected 722 up-regulated genes and 111 down-regulated genes. In the aged rats, we found 32 novel genes and 107 alternatively spliced genes. Notably, 6.6% of the up-regulated genes were related to inflammation (P < 2.2 × 10−16, Fisher exact t-test); 15.6% were novel genes with functional protein domains (P = 1.4 × 10−5); and 6.5% were genes showing alternative splicing events (P = 3.3 × 10−4). Based on the results of pathway analysis, we detected the involvement of inflammation-related pathways such as cytokines (P = 4.4 × 10−16), which were found up-regulated in the aged rats. Furthermore, an up-regulated inflammatory gene analysis identified the involvement of transcription factors, such as STAT4, EGR1, and FOSL1, which regulate cancer as well as inflammation in aging processes. Thus, RNA changes in these pathways support their involvement in the pro-inflammatory status during aging. We propose that whole RNA-Seq is a useful tool to identify novel genes and alternative splicing events by documenting broadly implicated inflammation-related genes involved in aging processes. PMID:27153548

  8. Comparative Transcriptome Analysis of Climacteric Fruit of Chinese Pear (Pyrus ussuriensis) Reveals New Insights into Fruit Ripening

    PubMed Central

    Tan, Dongmei; Jiang, Zhongyu; Wei, Yun; Li, Juncai; Wang, Aide

    2014-01-01

    The fruit of Pyrus ussuriensis is typically climacteric. During ripening, the fruits produce a large amount of ethylene, and their firmness drops rapidly. Although the molecular basis of climacteric fruit ripening has been studied in depth, some aspects remain unclear. Here, we compared the transcriptomes of pre- and post-climacteric fruits of Chinese pear (P. ussuriensis c.v. Nanguo) using RNA-seq. In total, 3,279 unigenes were differentially expressed between the pre- and post-climacteric fruits. Differentially expressed genes (DEGs) were subjected to Gene Ontology analysis, and 31 categories were significantly enriched in the groups ‘biological process’, ‘molecular function’ and ‘cellular component’. The DEGs included genes related to plant hormones, such as ethylene, ABA, auxin, GA and brassinosteroid, and transcription factors, such as MADS, NAC, WRKY and HSF. Moreover, genes encoding enzymes related to DNA methylation, cytoskeletal proteins and heat shock proteins (HSPs) showed differential expression between the pre- and post-climacteric fruits. Select DEGs were subjected to further analysis using quantitative RT-PCR (qRT-PCR), and the results were consistent with those of RNA-seq. Our data suggest that in addition to ethylene, other hormones play important roles in regulating fruit ripening and may interact with ethylene signaling during this process. DNA methylation-related methyltransferase and cytoskeletal protein genes are also involved in fruit ripening. Our results provide useful information for future research on pear fruit ripening. PMID:25215597

  9. Regulatory complexity revealed by integrated cytological and RNA-seq analyses of meiotic substages in mouse spermatocytes.

    PubMed

    Ball, Robyn L; Fujiwara, Yasuhiro; Sun, Fengyun; Hu, Jianjun; Hibbs, Matthew A; Handel, Mary Ann; Carter, Gregory W

    2016-08-12

    The continuous and non-synchronous nature of postnatal male germ-cell development has impeded stage-specific resolution of molecular events of mammalian meiotic prophase in the testis. Here the juvenile onset of spermatogenesis in mice is analyzed by combining cytological and transcriptomic data in a novel computational analysis that allows decomposition of the transcriptional programs of spermatogonia and meiotic prophase substages. Germ cells from testes of individual mice were obtained at two-day intervals from 8 to 18 days post-partum (dpp), prepared as surface-spread chromatin and immunolabeled for meiotic stage-specific protein markers (STRA8, SYCP3, phosphorylated H2AFX, and HISTH1T). Eight stages were discriminated cytologically by combinatorial antibody labeling, and RNA-seq was performed on the same samples. Independent principal component analyses of cytological and transcriptomic data yielded similar patterns for both data types, providing strong evidence for substage-specific gene expression signatures. A novel permutation-based maximum covariance analysis (PMCA) was developed to map co-expressed transcripts to one or more of the eight meiotic prophase substages, thereby linking distinct molecular programs to cytologically defined cell states. Expression of meiosis-specific genes is not substage-limited, suggesting regulation of substage transitions at other levels. This integrated analysis provides a general method for resolving complex cell populations. Here it revealed not only features of meiotic substage-specific gene expression, but also a network of substage-specific transcription factors and relationships to potential target genes.

  10. Sugarcane transcriptome analysis in response to infection caused by Acidovorax avenae subsp. avenae

    PubMed Central

    Grativol, Clícia; de Armas, Elvismary M.; Entenza, Júlio O. P.; Thiebaut, Flávia; Lima, Marcelo de F.; Farrinelli, Laurent; Hemerly, Adriana S.; Lifschitz, Sérgio; Ferreira, Paulo C. G.

    2016-01-01

    Sugarcane is an important tropical crop mainly cultivated to produce ethanol and sugar. Crop productivity is negatively affected by Acidovorax avenae subsp avenae (Aaa), which causes the red stripe disease. Little is known about the molecular mechanisms triggered in response to the infection. We have investigated the molecular mechanism activated in sugarcane using a RNA-seq approach. We have produced a de novo transcriptome assembly (TR7) from sugarcane RNA-seq libraries submitted to drought and infection with Aaa. Together, these libraries present 247 million of raw reads and resulted in 168,767 reference transcripts. Mapping in TR7 of reads obtained from infected libraries, revealed 798 differentially expressed transcripts, of which 723 were annotated, corresponding to 467 genes. GO and KEGG enrichment analysis showed that several metabolic pathways, such as code for proteins response to stress, metabolism of carbohydrates, processes of transcription and translation of proteins, amino acid metabolism and biosynthesis of secondary metabolites were significantly regulated in sugarcane. Differential analysis revealed that genes in the biosynthetic pathways of ET and JA PRRs, oxidative burst genes, NBS-LRR genes, cell wall fortification genes, SAR induced genes and pathogenesis-related genes (PR) were upregulated. In addition, 20 genes were validated by RT-qPCR. Together, these data contribute to a better understanding of the molecular mechanisms triggered by the Aaa in sugarcane and opens the opportunity for the development of molecular markers associated with disease tolerance in breeding programs. PMID:27936012

  11. Sugarcane transcriptome analysis in response to infection caused by Acidovorax avenae subsp. avenae.

    PubMed

    Santa Brigida, Ailton B; Rojas, Cristian A; Grativol, Clícia; de Armas, Elvismary M; Entenza, Júlio O P; Thiebaut, Flávia; Lima, Marcelo de F; Farrinelli, Laurent; Hemerly, Adriana S; Lifschitz, Sérgio; Ferreira, Paulo C G

    2016-01-01

    Sugarcane is an important tropical crop mainly cultivated to produce ethanol and sugar. Crop productivity is negatively affected by Acidovorax avenae subsp avenae (Aaa), which causes the red stripe disease. Little is known about the molecular mechanisms triggered in response to the infection. We have investigated the molecular mechanism activated in sugarcane using a RNA-seq approach. We have produced a de novo transcriptome assembly (TR7) from sugarcane RNA-seq libraries submitted to drought and infection with Aaa. Together, these libraries present 247 million of raw reads and resulted in 168,767 reference transcripts. Mapping in TR7 of reads obtained from infected libraries, revealed 798 differentially expressed transcripts, of which 723 were annotated, corresponding to 467 genes. GO and KEGG enrichment analysis showed that several metabolic pathways, such as code for proteins response to stress, metabolism of carbohydrates, processes of transcription and translation of proteins, amino acid metabolism and biosynthesis of secondary metabolites were significantly regulated in sugarcane. Differential analysis revealed that genes in the biosynthetic pathways of ET and JA PRRs, oxidative burst genes, NBS-LRR genes, cell wall fortification genes, SAR induced genes and pathogenesis-related genes (PR) were upregulated. In addition, 20 genes were validated by RT-qPCR. Together, these data contribute to a better understanding of the molecular mechanisms triggered by the Aaa in sugarcane and opens the opportunity for the development of molecular markers associated with disease tolerance in breeding programs.

  12. RNA-Seq and Gene Network Analysis Uncover Activation of an ABA-Dependent Signalosome During the Cork Oak Root Response to Drought

    PubMed Central

    Magalhães, Alexandre P.; Verde, Nuno; Reis, Francisca; Martins, Inês; Costa, Daniela; Lino-Neto, Teresa; Castro, Pedro H.; Tavares, Rui M.; Azevedo, Herlânder

    2016-01-01

    Quercus suber (cork oak) is a West Mediterranean species of key economic interest, being extensively explored for its ability to generate cork. Like other Mediterranean plants, Q. suber is significantly threatened by climatic changes, imposing the need to quickly understand its physiological and molecular adaptability to drought stress imposition. In the present report, we uncovered the differential transcriptome of Q. suber roots exposed to long-term drought, using an RNA-Seq approach. 454-sequencing reads were used to de novo assemble a reference transcriptome, and mapping of reads allowed the identification of 546 differentially expressed unigenes. These were enriched in both effector genes (e.g., LEA, chaperones, transporters) as well as regulatory genes, including transcription factors (TFs) belonging to various different classes, and genes associated with protein turnover. To further extend functional characterization, we identified the orthologs of differentially expressed unigenes in the model species Arabidopsis thaliana, which then allowed us to perform in silico functional inference, including gene network analysis for protein function, protein subcellular localization and gene co-expression, and in silico enrichment analysis for TFs and cis-elements. Results indicated the existence of extensive transcriptional regulatory events, including activation of ABA-responsive genes and ABF-dependent signaling. We were then able to establish that a core ABA-signaling pathway involving PP2C-SnRK2-ABF components was induced in stressed Q. suber roots, identifying a key mechanism in this species’ response to drought. PMID:26793200

  13. Application of the Gini correlation coefficient to infer regulatory relationships in transcriptome analysis.

    PubMed

    Ma, Chuang; Wang, Xiangfeng

    2012-09-01

    One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.

  14. Application of the Gini Correlation Coefficient to Infer Regulatory Relationships in Transcriptome Analysis[W][OA

    PubMed Central

    Ma, Chuang; Wang, Xiangfeng

    2012-01-01

    One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey’s biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses. PMID:22797655

  15. Integrated Genomics Reveals Convergent Transcriptomic Networks Underlying Chronic Obstructive Pulmonary Disease and Idiopathic Pulmonary Fibrosis.

    PubMed

    Kusko, Rebecca L; Brothers, John F; Tedrow, John; Pandit, Kusum; Huleihel, Luai; Perdomo, Catalina; Liu, Gang; Juan-Guardela, Brenda; Kass, Daniel; Zhang, Sherry; Lenburg, Marc; Martinez, Fernando; Quackenbush, John; Sciurba, Frank; Limper, Andrew; Geraci, Mark; Yang, Ivana; Schwartz, David A; Beane, Jennifer; Spira, Avrum; Kaminski, Naftali

    2016-10-15

    Despite shared environmental exposures, idiopathic pulmonary fibrosis (IPF) and chronic obstructive pulmonary disease are usually studied in isolation, and the presence of shared molecular mechanisms is unknown. We applied an integrative genomic approach to identify convergent transcriptomic pathways in emphysema and IPF. We defined the transcriptional repertoire of chronic obstructive pulmonary disease, IPF, or normal histology lungs using RNA-seq (n = 87). Genes increased in both emphysema and IPF relative to control were enriched for the p53/hypoxia pathway, a finding confirmed in an independent cohort using both gene expression arrays and the nCounter Analysis System (n = 193). Immunohistochemistry confirmed overexpression of HIF1A, MDM2, and NFKBIB members of this pathway in tissues from patients with emphysema or IPF. Using reads aligned across splice junctions, we determined that alternative splicing of p53/hypoxia pathway-associated molecules NUMB and PDGFA occurred more frequently in IPF or emphysema compared with control and validated these findings by quantitative polymerase chain reaction and the nCounter Analysis System on an independent sample set (n = 193). Finally, by integrating parallel microRNA and mRNA-Seq data on the same samples, we identified MIR96 as a key novel regulatory hub in the p53/hypoxia gene-expression network and confirmed that modulation of MIR96 in vitro recapitulates the disease-associated gene-expression network. Our results suggest convergent transcriptional regulatory hubs in diseases as varied phenotypically as chronic obstructive pulmonary disease and IPF and suggest that these hubs may represent shared key responses of the lung to environmental stresses.

  16. A Deeper Examination of Thorellius atrox Scorpion Venom Components with Omic Techonologies.

    PubMed

    Romero-Gutierrez, Teresa; Peguero-Sanchez, Esteban; Cevallos, Miguel A; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

    2017-12-12

    This communication reports a further examination of venom gland transcripts and venom composition of the Mexican scorpion Thorellius atrox using RNA-seq and tandem mass spectrometry. The RNA-seq, which was performed with the Illumina protocol, yielded more than 20,000 assembled transcripts. Following a database search and annotation strategy, 160 transcripts were identified, potentially coding for venom components. A novel sequence was identified that potentially codes for a peptide with similarity to spider ω-agatoxins, which act on voltage-gated calcium channels, not known before to exist in scorpion venoms. Analogous transcripts were found in other scorpion species. They could represent members of a new scorpion toxin family, here named omegascorpins. The mass fingerprint by LC-MS identified 135 individual venom components, five of which matched with the theoretical masses of putative peptides translated from the transcriptome. The LC-MS/MS de novo sequencing allowed to reconstruct and identify 42 proteins encoded by assembled transcripts, thus validating the transcriptome analysis. Earlier studies conducted with this scorpion venom permitted the identification of only twenty putative venom components. The present work performed with more powerful and modern omic technologies demonstrates the capacity of accomplishing a deeper characterization of scorpion venom components and the identification of novel molecules with potential applications in biomedicine and the study of ion channel physiology.

  17. Detecting Antigen-Specific T Cell Responses: From Bulk Populations to Single Cells.

    PubMed

    Phetsouphanh, Chansavath; Zaunders, John James; Kelleher, Anthony Dominic

    2015-08-12

    A new generation of sensitive T cell-based assays facilitates the direct quantitation and characterization of antigen-specific T cell responses. Single-cell analyses have focused on measuring the quality and breadth of a response. Accumulating data from these studies demonstrate that there is considerable, previously-unrecognized, heterogeneity. Standard assays, such as the ICS, are often insufficient for characterization of rare subsets of cells. Enhanced flow cytometry with imaging capabilities enables the determination of cell morphology, as well as the spatial localization of the protein molecules within a single cell. Advances in both microfluidics and digital PCR have improved the efficiency of single-cell sorting and allowed multiplexed gene detection at the single-cell level. Delving further into the transcriptome of single-cells using RNA-seq is likely to reveal the fine-specificity of cellular events such as alternative splicing (i.e., splice variants) and allele-specific expression, and will also define the roles of new genes. Finally, detailed analysis of clonally related antigen-specific T cells using single-cell TCR RNA-seq will provide information on pathways of differentiation of memory T cells. With these state of the art technologies the transcriptomics and genomics of Ag-specific T cells can be more definitively elucidated.

  18. Detecting Antigen-Specific T Cell Responses: From Bulk Populations to Single Cells

    PubMed Central

    Phetsouphanh, Chansavath; Zaunders, John James; Kelleher, Anthony Dominic

    2015-01-01

    A new generation of sensitive T cell-based assays facilitates the direct quantitation and characterization of antigen-specific T cell responses. Single-cell analyses have focused on measuring the quality and breadth of a response. Accumulating data from these studies demonstrate that there is considerable, previously-unrecognized, heterogeneity. Standard assays, such as the ICS, are often insufficient for characterization of rare subsets of cells. Enhanced flow cytometry with imaging capabilities enables the determination of cell morphology, as well as the spatial localization of the protein molecules within a single cell. Advances in both microfluidics and digital PCR have improved the efficiency of single-cell sorting and allowed multiplexed gene detection at the single-cell level. Delving further into the transcriptome of single-cells using RNA-seq is likely to reveal the fine-specificity of cellular events such as alternative splicing (i.e., splice variants) and allele-specific expression, and will also define the roles of new genes. Finally, detailed analysis of clonally related antigen-specific T cells using single-cell TCR RNA-seq will provide information on pathways of differentiation of memory T cells. With these state of the art technologies the transcriptomics and genomics of Ag-specific T cells can be more definitively elucidated. PMID:26274954

  19. Sublethal effects of imidacloprid on targeting muscle and ribosomal protein related genes in the honey bee Apis mellifera L.

    PubMed

    Wu, Yan-Yan; Luo, Qi-Hua; Hou, Chun-Sheng; Wang, Qiang; Dai, Ping-Li; Gao, Jing; Liu, Yong-Jun; Diao, Qing-Yun

    2017-11-21

    A sublethal concentration of imidacloprid can cause chronic toxicity in bees and can impact the behavior of honey bees. The nectar- and water-collecting, and climbing abilities of bees are crucial to the survival of the bees and the execution of responsibilities in bee colonies. Besides behavioral impact, data on the molecular mechanisms underlying the toxicity of imidacloprid, especially by the way of RNA-seq at the transcriptomic level, are limited. We treated Apis mellifera L. with sublethal concentrations of imidacloprid (0.1, 1 and 10 ppb) and determined the effect on behaviors and the transcriptomic changes. The sublethal concentrations of imidacloprid had a limited impact on the survival and syrup consumption of bees, but caused a significant increase in water consumption. Moreover, the climbing ability was significantly impaired by 10 ppb imidacloprid at 8 d. In the RNA-seq analysis, gene ontology (GO) term enrichment indicated a significant down-regulation of muscle-related genes, which might contribute to the impairment in climbing ability of bees. The enriched GO terms were attributed to the up-regulated ribosomal protein genes. Considering the ribosomal and extra-ribosomal functions of the ribosomal proteins, we hypothesized that imidacloprid also causes cell dysfunction. Our findings further enhance the understanding of imidacloprid sublethal toxicity.

  20. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.

    PubMed

    Zappia, Luke; Phipson, Belinda; Oshlack, Alicia

    2018-06-25

    As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source and open-science approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records the growth of the field over time.

  1. The planarian regeneration transcriptome reveals a shared but temporally shifted regulatory program between opposing head and tail scenarios.

    PubMed

    Kao, Damian; Felix, Daniel; Aboobaker, Aziz

    2013-11-16

    Planarians can regenerate entire animals from a small fragment of the body. The regenerating fragment is able to create new tissues and remodel existing tissues to form a complete animal. Thus different fragments with very different starting components eventually converge on the same solution. In this study, we performed an extensive RNA-seq time-course on regenerating head and tail fragments to observe the differences and similarities of the transcriptional landscape between head and tail fragments during regeneration. We have consolidated existing transcriptomic data for S. mediterranea to generate a high confidence set of transcripts for use in genome wide expression studies. We performed a RNA-seq time-course on regenerating head and tail fragments from 0 hours to 3 days. We found that the transcriptome profiles of head and tail regeneration were very different at the start of regeneration; however, an unexpected convergence of transcriptional profiles occurred at 48 hours when head and tail fragments are still morphologically distinct. By comparing differentially expressed transcripts at various time-points, we revealed that this divergence/convergence pattern is caused by a shared regulatory program that runs early in heads and later in tails.Additionally, we also performed RNA-seq on smed-prep(RNAi) tail fragments which ultimately fail to regenerate anterior structures. We find the gene regulation program in response to smed-prep(RNAi) to display the opposite regulatory trend compared to the previously mentioned share regulatory program during regeneration. Using annotation data and comparative approaches, we also identified a set of approximately 4,800 triclad specific transcripts that were enriched amongst the genes displaying differential expression during the regeneration time-course. The regeneration transcriptome of head and tail regeneration provides us with a rich resource for investigating the global expression changes that occurs during regeneration. We show that very different regenerative scenarios utilize a shared core regenerative program. Furthermore, our consolidated transcriptome and annotations allowed us to identity triclad specific transcripts that are enriched within this core regulatory program. Our data support the hypothesis that both conserved aspects of animal developmental programs and recent evolutionarily innovations work in concert to control regeneration.

  2. De novo transcriptome analysis and microsatellite marker development for population genetic study of a serious insect pest, Rhopalosiphum padi (L.) (Hemiptera: Aphididae).

    PubMed

    Duan, Xinle; Wang, Kang; Su, Sha; Tian, Ruizheng; Li, Yuting; Chen, Maohua

    2017-01-01

    The bird cherry-oat aphid, Rhopalosiphum padi (L.), is one of the most abundant aphid pests of cereals and has a global distribution. Next-generation sequencing (NGS) is a rapid and efficient method for developing molecular markers. However, transcriptomic and genomic resources of R. padi have not been investigated. In this study, we used transcriptome information obtained by RNA-Seq to develop polymorphic microsatellites for investigating population genetics in this species. The transcriptome of R. padi was sequenced on an Illumina HiSeq 2000 platform. A total of 114.4 million raw reads with a GC content of 40.03% was generated. The raw reads were cleaned and assembled into 29,467 unigenes with an N50 length of 1,580 bp. Using several public databases, 82.47% of these unigenes were annotated. Of the annotated unigenes, 8,022 were assigned to COG pathways, 9,895 were assigned to GO pathways, and 14,586 were mapped to 257 KEGG pathways. A total of 7,936 potential microsatellites were identified in 5,564 unigenes, 60 of which were selected randomly and amplified using specific primer pairs. Fourteen loci were found to be polymorphic in the four R. padi populations. The transcriptomic data presented herein will facilitate gene discovery, gene analyses, and development of molecular markers for future studies of R. padi and other closely related aphid species.

  3. Improved annotation with de novo transcriptome assembly in four social amoeba species.

    PubMed

    Singh, Reema; Lawal, Hajara M; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J; Schaap, Pauline; Cole, Christian

    2017-01-31

    Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

  4. Comparative Transcriptomics Highlights the Role of the Activator Protein 1 Transcription Factor in the Host Response to Ebolavirus

    PubMed Central

    Todd, Shawn; Boyd, Victoria; Tachedjian, Mary; Klein, Reuben; Shiell, Brian; Dearnley, Megan; McAuley, Alexander J.; Woon, Amanda P.; Purcell, Anthony W.; Marsh, Glenn A.; Baker, Michelle L.

    2017-01-01

    ABSTRACT Ebolavirus and Marburgvirus comprise two genera of negative-sense single-stranded RNA viruses that cause severe hemorrhagic fevers in humans. Despite considerable research efforts, the molecular events following Ebola virus (EBOV) infection are poorly understood. With the view of identifying host factors that underpin EBOV pathogenesis, we compared the transcriptomes of EBOV-infected human, pig, and bat kidney cells using a transcriptome sequencing (RNA-seq) approach. Despite a significant difference in viral transcription/replication between the cell lines, all cells responded to EBOV infection through a robust induction of extracellular growth factors. Furthermore, a significant upregulation of activator protein 1 (AP1) transcription factor complex members FOS and JUN was observed in permissive cell lines. Functional studies focusing on human cells showed that EBOV infection induces protein expression, phosphorylation, and nuclear accumulation of JUN and, to a lesser degree, FOS. Using a luciferase-based reporter, we show that EBOV infection induces AP1 transactivation activity within human cells at 48 and 72 h postinfection. Finally, we show that JUN knockdown decreases the expression of EBOV-induced host gene expression. Taken together, our study highlights the role of AP1 in promoting the host gene expression profile that defines EBOV pathogenesis. IMPORTANCE Many questions remain about the molecular events that underpin filovirus pathophysiology. The rational design of new intervention strategies, such as postexposure therapeutics, will be significantly enhanced through an in-depth understanding of these molecular events. We believe that new insights into the molecular pathogenesis of EBOV may be possible by examining the transcriptomic response of taxonomically diverse cell lines (derived from human, pig, and bat). We first identified the responsive pathways using an RNA-seq-based transcriptomics approach. Further functional and computational analysis focusing on human cells highlighted an important role for the AP1 transcription factor in mediating the transcriptional response to EBOV infection. Our study sheds new light on how host transcription factors respond to and promote the transcriptional landscape that follows viral infection. PMID:28931675

  5. Comprehensive analysis of single molecule sequencing-derived complete genome and whole transcriptome of Hyposidra talaca nuclear polyhedrosis virus.

    PubMed

    Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar

    2018-06-12

    We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.

  6. Differentiated transcriptional signatures in the maize landraces of Chiapas, Mexico.

    PubMed

    Kost, Matthew A; Perales, Hugo R; Wijeratne, Saranga; Wijeratne, Asela J; Stockinger, Eric; Mercer, Kristin L

    2017-09-08

    Landrace farmers are the keepers of crops locally adapted to the environments where they are cultivated. Patterns of diversity across the genome can provide signals of past evolution in the face of abiotic and biotic change. Understanding this rich genetic resource is imperative especially since diversity can provide agricultural security as climate continues to shift. Here we employ RNA sequencing (RNA-seq) to understand the role that conditions that vary across a landscape may have played in shaping genetic diversity in the maize landraces of Chiapas, Mexico. We collected landraces from three distinct elevational zones and planted them in a midland common garden. Early season leaf tissue was collected for RNA-seq and we performed weighted gene co-expression network analysis (WGCNA). We then used association analysis between landrace co-expression module expression values and environmental parameters of landrace origin to elucidate genes and gene networks potentially shaped by environmental factors along our study gradient. Elevation of landrace origin affected the transcriptome profiles. Two co-expression modules were highly correlated with temperature parameters of landrace origin and queries into their 'hub' genes suggested that temperature may have led to differentiation among landraces in hormone biosynthesis/signaling and abiotic and biotic stress responses. We identified several 'hub' transcription factors and kinases as candidates for the regulation of these responses. These findings indicate that natural selection may influence the transcriptomes of crop landraces along an elevational gradient in a major diversity center, and provide a foundation for exploring the genetic basis of local adaptation. While we cannot rule out the role of neutral evolutionary forces in the patterns we have identified, combining whole transcriptome sequencing technologies, established bioinformatics techniques, and common garden experimentation can powerfully elucidate structure of adaptive diversity across a varied landscape. Ultimately, gaining such understanding can facilitate the conservation and strategic utilization of crop genetic diversity in a time of climate change.

  7. Identification and validation of differentially expressed transcripts by RNA-sequencing of formalin-fixed, paraffin-embedded (FFPE) lung tissue from patients with Idiopathic Pulmonary Fibrosis.

    PubMed

    Vukmirovic, Milica; Herazo-Maya, Jose D; Blackmon, John; Skodric-Trifunovic, Vesna; Jovanovic, Dragana; Pavlovic, Sonja; Stojsic, Jelena; Zeljkovic, Vesna; Yan, Xiting; Homer, Robert; Stefanovic, Branko; Kaminski, Naftali

    2017-01-12

    Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues. We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four. Our results demonstrate that RNA sequencing of RNA obtained from archived FFPE lung tissues is feasible. The results obtained from FFPE tissue are highly comparable to FF tissues. The ability to perform RNA-Seq on archived FFPE IPF tissues should greatly enhance the availability of tissue biopsies for research in IPF.

  8. GFFview: A Web Server for Parsing and Visualizing Annotation Information of Eukaryotic Genome.

    PubMed

    Deng, Feilong; Chen, Shi-Yi; Wu, Zhou-Lin; Hu, Yongsong; Jia, Xianbo; Lai, Song-Jia

    2017-10-01

    Owing to wide application of RNA sequencing (RNA-seq) technology, more and more eukaryotic genomes have been extensively annotated, such as the gene structure, alternative splicing, and noncoding loci. Annotation information of genome is prevalently stored as plain text in General Feature Format (GFF), which could be hundreds or thousands Mb in size. Therefore, it is a challenge for manipulating GFF file for biologists who have no bioinformatic skill. In this study, we provide a web server (GFFview) for parsing the annotation information of eukaryotic genome and then generating statistical description of six indices for visualization. GFFview is very useful for investigating quality and difference of the de novo assembled transcriptome in RNA-seq studies.

  9. Developing cDNA Libraries of Receptors Involved in the Recruitment of the Biofouling Tubeworm Hydroides elegans

    DTIC Science & Technology

    2014-06-12

    Transcriptome, Hydroides elegans, Next Generation Sequencing, Illumina HiSeq, PacBio SMRT, Biofilm , Metamorphosis 16. SECURITY CLASSIFICATION OF: a...to a bacterial cue from a bacterial biofilm . Recently, this cue has been identified to be a phage-tail like bacteriocin produced by the bacterium...submitted to the Huntsman Cancer Institute at the University of Utah and the subsequent isolation of mRNA was used for Illumina HiSeq 101 paired end

  10. Identification of molecular tumor markers in renal cell carcinomas with TFE3 protein expression by RNA sequencing.

    PubMed

    Pflueger, Dorothee; Sboner, Andrea; Storz, Martina; Roth, Jasmine; Compérat, Eva; Bruder, Elisabeth; Rubin, Mark A; Schraml, Peter; Moch, Holger

    2013-11-01

    TFE3 translocation renal cell carcinoma (tRCC) is defined by chromosomal translocations involving the TFE3 transcription factor at chromosome Xp11.2. Genetically proven TFE3 tRCCs have a broad histologic spectrum with overlapping features to other renal tumor subtypes. In this study, we aimed for characterizing RCC with TFE3 protein expression. Using next-generation whole transcriptome sequencing (RNA-Seq) as a discovery tool, we analyzed fusion transcripts, gene expression profile, and somatic mutations in frozen tissue of one TFE3 tRCC. By applying a computational analysis developed to call chimeric RNA molecules from paired-end RNA-Seq data, we confirmed the known TFE3 translocation. Its fusion partner SFPQ has already been described as fusion partner in tRCCs. In addition, an RNA read-through chimera between TMED6 and COG8 as well as MET and KDR (VEGFR2) point mutations were identified. An EGFR mutation, but no chromosomal rearrangements, was identified in a control group of five clear cell RCCs (ccRCCs). The TFE3 tRCC could be clearly distinguished from the ccRCCs by RNA-Seq gene expression measurements using a previously reported tRCC gene signature. In validation experiments using reverse transcription-PCR, TMED6-COG8 chimera expression was significantly higher in nine TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in 24 ccRCCs (P < .001) and 22 papillary RCCs (P < .05-.07). Immunohistochemical analysis of selected genes from the tRCC gene signature showed significantly higher eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) and Contactin 3 (CNTN3) expression in 16 TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in over 200 ccRCCs (P < .0001, both).

  11. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation.

    PubMed

    Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K; Andreasen, Peter A; Bendixen, Christian; Panitz, Frank

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes.

  12. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation

    PubMed Central

    Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K.; Bendixen, Christian

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes. PMID:28832628

  13. RNAbrowse: RNA-Seq De Novo Assembly Results Browser

    PubMed Central

    Mariette, Jérôme; Noirot, Céline; Nabihoudine, Ibounyamine; Bardou, Philippe; Hoede, Claire; Djari, Anis; Cabau, Cédric; Klopp, Christophe

    2014-01-01

    Transcriptome analysis based on a de novo assembly of next generation RNA sequences is now performed routinely in many laboratories. The generated results, including contig sequences, quantification figures, functional annotations and variation discovery outputs are usually bulky and quite diverse. This article presents a user oriented storage and visualisation environment permitting to explore the data in a top-down manner, going from general graphical views to all possible details. The software package is based on biomart, easy to install and populate with local data. The software package is available under the GNU General Public License (GPL) at http://bioinfo.genotoul.fr/RNAbrowse. PMID:24823498

  14. RISE: a database of RNA interactome from sequencing experiments

    PubMed Central

    Gong, Jing; Shao, Di; Xu, Kui

    2018-01-01

    Abstract We present RISE (http://rise.zhanglab.net), a database of RNA Interactome from Sequencing Experiments. RNA-RNA interactions (RRIs) are essential for RNA regulation and function. RISE provides a comprehensive collection of RRIs that mainly come from recent transcriptome-wide sequencing-based experiments like PARIS, SPLASH, LIGR-seq, and MARIO, as well as targeted studies like RIA-seq, RAP-RNA and CLASH. It also includes interactions aggregated from other primary databases and publications. The RISE database currently contains 328,811 RNA-RNA interactions mainly in human, mouse and yeast. While most existing RNA databases mainly contain interactions of miRNA targeting, notably, more than half of the RRIs in RISE are among mRNA and long non-coding RNAs. We compared different RRI datasets in RISE and found limited overlaps in interactions resolved by different techniques and in different cell lines. It may suggest technology preference and also dynamic natures of RRIs. We also analyzed the basic features of the human and mouse RRI networks and found that they tend to be scale-free, small-world, hierarchical and modular. The analysis may nominate important RNAs or RRIs for further investigation. Finally, RISE provides a Circos plot and several table views for integrative visualization, with extensive molecular and functional annotations to facilitate exploration of biological functions for any RRI of interest. PMID:29040625

  15. RiboGalaxy: A browser based platform for the alignment, analysis and visualization of ribosome profiling data

    PubMed Central

    Michel, Audrey M.; Mullan, James P. A.; Velayudhan, Vimalkumar; O'Connor, Patrick B. F.; Donohue, Claire A.; Baranov, Pavel V.

    2016-01-01

    ABSTRACT Ribosome profiling (ribo-seq) is a technique that uses high-throughput sequencing to reveal the exact locations and densities of translating ribosomes at the entire transcriptome level. The technique has become very popular since its inception in 2009. Yet experimentalists who generate ribo-seq data often have to rely on bioinformaticians to process and analyze their data. We present RiboGalaxy (http://ribogalaxy.ucc.ie), a freely available Galaxy-based web server for processing and analyzing ribosome profiling data with the visualization functionality provided by GWIPS-viz (http://gwips.ucc.ie). RiboGalaxy offers researchers a suite of tools specifically tailored for processing ribo-seq and corresponding mRNA-seq data. Researchers can take advantage of the published workflows which reduce the multi-step alignment process to a minimum of inputs from the user. Users can then explore their own aligned data as custom tracks in GWIPS-viz and compare their ribosome profiles to existing ribo-seq tracks from published studies. In addition, users can assess the quality of their ribo-seq data, determine the strength of the triplet periodicity signal, generate meta-gene ribosome profiles as well as analyze the relative impact of mRNA sequence features on local read density. RiboGalaxy is accompanied by extensive documentation and tips for helping users. In addition we provide a forum (http://gwips.ucc.ie/Forum) where we encourage users to post their questions and feedback to improve the overall RiboGalaxy service. PMID:26821742

  16. Comparative transcriptomics of early dipteran development

    PubMed Central

    2013-01-01

    Background Modern sequencing technologies have massively increased the amount of data available for comparative genomics. Whole-transcriptome shotgun sequencing (RNA-seq) provides a powerful basis for comparative studies. In particular, this approach holds great promise for emerging model species in fields such as evolutionary developmental biology (evo-devo). Results We have sequenced early embryonic transcriptomes of two non-drosophilid dipteran species: the moth midge Clogmia albipunctata, and the scuttle fly Megaselia abdita. Our analysis includes a third, published, transcriptome for the hoverfly Episyrphus balteatus. These emerging models for comparative developmental studies close an important phylogenetic gap between Drosophila melanogaster and other insect model systems. In this paper, we provide a comparative analysis of early embryonic transcriptomes across species, and use our data for a phylogenomic re-evaluation of dipteran phylogenetic relationships. Conclusions We show how comparative transcriptomics can be used to create useful resources for evo-devo, and to investigate phylogenetic relationships. Our results demonstrate that de novo assembly of short (Illumina) reads yields high-quality, high-coverage transcriptomic data sets. We use these data to investigate deep dipteran phylogenetic relationships. Our results, based on a concatenation of 160 orthologous genes, provide support for the traditional view of Clogmia being the sister group of Brachycera (Megaselia, Episyrphus, Drosophila), rather than that of Culicomorpha (which includes mosquitoes and blackflies). PMID:23432914

  17. Transcriptome profiling of the floral buds and discovery of genes related to sex-differentiation in the dioecious cucurbit Coccinia grandis (L.) Voigt.

    PubMed

    Mohanty, Jatindra Nath; Nayak, Sanghamitra; Jha, Sumita; Joshi, Raj Kumar

    2017-08-30

    Dioecious species offer an inclusive structure to study the molecular basis of sexual dimorphism in angiosperms. Despite having a small genome and heteromorphic sex chromosomes, Coccinia grandis is a highly neglected dioecious species with little information available on its physical state, genetic orientation and key sex-defining elements. In the present study, we performed RNA-Seq and DGE analysis of male (MB) and female (FB) buds in C. grandis to gain insights into the molecular basis of sex determination in this plant. De novo assembly of 75 million clean reads resulted in 72,479 unigenes for male library and 63,308 unigenes for female library with a mean length of 736bp. 61,458 (85.57%) unigenes displayed significant similarity with protein sequences from publicly available databases. Comparative transcriptome analyses revealed 1410 unigenes as differentially expressed (DEGs) between MB and FB samples. A consistent correlation between the expression levels of DEGs was observed for the RNA-Seq pattern and qRT-PCR validation. Functional annotation showed high enrichment of DEGs involved in phytohormone biosynthesis, hormone signaling and transduction, transcriptional regulation and methyltransferase activity. High induction of hormone responsive genes such as ARF6, ACC synthase1, SNRK2 and BRI1-associated receptor kinase 1 (BAK1) suggest that multiple phytohormones and their signaling crosstalk play crucial role in sex determination in this species. Beside, the transcription factors such as zinc fingers, homeodomain leucine zippers and MYBs were identified as major determinants of male specific expression. Moreover, the detection of multiple DEGs as the miRNA target site implies that a small RNA mediated gene silencing cascade may also be regulating gender differentiation in C. grandis. Overall, the present transcriptome resources provide us a large number of DEGs involved in sex expression and could form the groundwork for unravelling the molecular mechanism of sex determination in C. grandis. Copyright © 2017 Elsevier B.V. All rights reserved.

  18. Mechanisms of colitis-accelerated colon carcinogenesis and its prevention with the combination of aspirin and curcumin: Transcriptomic analysis using RNA-seq.

    PubMed

    Guo, Yue; Su, Zheng-Yuan; Zhang, Chengyue; Gaspar, John M; Wang, Rui; Hart, Ronald P; Verzi, Michael P; Kong, Ah-Ng Tony

    2017-07-01

    Colorectal cancer (CRC) remains the leading cause of cancer-related death in the world. Aspirin (ASA) and curcumin (CUR) are widely investigated chemopreventive candidates for CRC. However, the precise mechanisms of their action and their combinatorial effects have not been evaluated. The purpose of the present study was to determine the effect of ASA, CUR, and their combination in azoxymethane/dextran sulfate sodium (AOM/DSS)-induced colitis-accelerated colorectal cancer (CAC). We also aimed to characterize the differential gene expression profiles in AOM/DSS-induced tumors as well as in tumors modulated by ASA and CUR using RNA-seq. Diets supplemented with 0.02% ASA, 2% CUR or 0.01% ASA+1% CUR were given to mice from 1week prior to the AOM injection until the experiment was terminated 22weeks after AOM initiation. Our results showed that CUR had a superior inhibitory effect in colon tumorigenesis compared to that of ASA. The combination of ASA and CUR at a lower dose exhibited similar efficacy to that of a higher dose of CUR at 2%. RNA isolated from colonic tissue from the control group and from tumor samples from the experimental groups was subjected to RNA-seq. Transcriptomic analysis suggested that the low-dose combination of ASA and CUR modulated larger gene sets than the single treatment. These differentially expressed genes were situated in several canonical pathways important in the inflammatory network and liver metastasis in CAC. We identified a small subset of genes as potential molecular targets involved in the preventive action of the combination of ASA and CUR. Taken together, the current results provide the first evidence in support of the chemopreventive effect of a low-dose combination of ASA and CUR in CAC. Moreover, the transcriptional profile obtained in our study may provide a framework for identifying the mechanisms underlying the carcinogenesis process from normal colonic tissue to tumor development as well as the cancer inhibitory effects and potential molecular targets of ASA and CUR. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. The grapevine expression atlas reveals a deep transcriptome shift driving the entire plant into a maturation program.

    PubMed

    Fasoli, Marianna; Dal Santo, Silvia; Zenoni, Sara; Tornielli, Giovanni Battista; Farina, Lorenzo; Zamboni, Anita; Porceddu, Andrea; Venturini, Luca; Bicego, Manuele; Murino, Vittorio; Ferrarini, Alberto; Delledonne, Massimo; Pezzotti, Mario

    2012-09-01

    We developed a genome-wide transcriptomic atlas of grapevine (Vitis vinifera) based on 54 samples representing green and woody tissues and organs at different developmental stages as well as specialized tissues such as pollen and senescent leaves. Together, these samples expressed ∼91% of the predicted grapevine genes. Pollen and senescent leaves had unique transcriptomes reflecting their specialized functions and physiological status. However, microarray and RNA-seq analysis grouped all the other samples into two major classes based on maturity rather than organ identity, namely, the vegetative/green and mature/woody categories. This division represents a fundamental transcriptomic reprogramming during the maturation process and was highlighted by three statistical approaches identifying the transcriptional relationships among samples (correlation analysis), putative biomarkers (O2PLS-DA approach), and sets of strongly and consistently expressed genes that define groups (topics) of similar samples (biclustering analysis). Gene coexpression analysis indicated that the mature/woody developmental program results from the reiterative coactivation of pathways that are largely inactive in vegetative/green tissues, often involving the coregulation of clusters of neighboring genes and global regulation based on codon preference. This global transcriptomic reprogramming during maturation has not been observed in herbaceous annual species and may be a defining characteristic of perennial woody plants.

  20. Altered Exosomal RNA Profiles in Bronchoalveolar Lavage from Lung Transplants with Acute Rejection.

    PubMed

    Gregson, Aric L; Hoji, Aki; Injean, Patil; Poynter, Steven T; Briones, Claudia; Palchevskiy, Vyacheslav; Weigt, S Sam; Shino, Michael Y; Derhovanessian, Ariss; Sayah, David; Saggar, Rajan; Ross, David; Ardehali, Abbas; Lynch, Joseph P; Belperio, John A

    2015-12-15

    The mechanism by which acute allograft rejection leads to chronic rejection remains poorly understood despite its common occurrence. Exosomes, membrane vesicles released from cells within the lung allograft, contain a diverse array of biomolecules that closely reflect the biologic state of the cell and tissue from which they are released. Exosome transcriptomes may provide a better understanding of the rejection process. Furthermore, biomarkers originating from this transcriptome could provide timely and sensitive detection of acute cellular rejection (AR), reducing the incidence of severe AR and chronic lung allograft dysfunction and improving outcomes. To provide an in-depth analysis of the bronchoalveolar lavage fluid exosomal shuttle RNA population after lung transplantation and evaluate for differential expression between acute AR and quiescence. Serial bronchoalveolar lavage specimens were ultracentrifuged to obtain the exosomal pellet for RNA extraction, on which RNA-Seq was performed. AR demonstrates an intense inflammatory environment, skewed toward both innate and adaptive immune responses. Novel, potential upstream regulators identified offer potential therapeutic targets. Our findings validate bronchoalveolar lavage fluid exosomal shuttle RNA as a source for understanding the pathophysiology of AR and for biomarker discovery in lung transplantation.

  1. Altered Exosomal RNA Profiles in Bronchoalveolar Lavage from Lung Transplants with Acute Rejection

    PubMed Central

    Hoji, Aki; Injean, Patil; Poynter, Steven T.; Briones, Claudia; Palchevskiy, Vyacheslav; Sam Weigt, S.; Shino, Michael Y.; Derhovanessian, Ariss; Saggar, Rajan; Ross, David; Ardehali, Abbas; Lynch, Joseph P.; Belperio, John A.

    2015-01-01

    Rationale: The mechanism by which acute allograft rejection leads to chronic rejection remains poorly understood despite its common occurrence. Exosomes, membrane vesicles released from cells within the lung allograft, contain a diverse array of biomolecules that closely reflect the biologic state of the cell and tissue from which they are released. Exosome transcriptomes may provide a better understanding of the rejection process. Furthermore, biomarkers originating from this transcriptome could provide timely and sensitive detection of acute cellular rejection (AR), reducing the incidence of severe AR and chronic lung allograft dysfunction and improving outcomes. Objectives: To provide an in-depth analysis of the bronchoalveolar lavage fluid exosomal shuttle RNA population after lung transplantation and evaluate for differential expression between acute AR and quiescence. Methods: Serial bronchoalveolar lavage specimens were ultracentrifuged to obtain the exosomal pellet for RNA extraction, on which RNA-Seq was performed. Measurements and Main Results: AR demonstrates an intense inflammatory environment, skewed toward both innate and adaptive immune responses. Novel, potential upstream regulators identified offer potential therapeutic targets. Conclusions: Our findings validate bronchoalveolar lavage fluid exosomal shuttle RNA as a source for understanding the pathophysiology of AR and for biomarker discovery in lung transplantation. PMID:26308930

  2. Response of the hepatic transcriptome to aflatoxin B1 in domestic turkey (Meleagris gallopavo).

    PubMed

    Monson, Melissa S; Settlage, Robert E; McMahon, Kevin W; Mendoza, Kristelle M; Rawal, Sumit; El-Nezami, Hani S; Coulombe, Roger A; Reed, Kent M

    2014-01-01

    Dietary exposure to aflatoxin B1 (AFB1) is detrimental to avian health and leads to major economic losses for the poultry industry. AFB1 is especially hepatotoxic in domestic turkeys (Meleagris gallopavo), since these birds are unable to detoxify AFB1 by glutathione-conjugation. The impacts of AFB1 on the turkey hepatic transcriptome and the potential protection from pretreatment with a Lactobacillus-based probiotic mixture were investigated through RNA-sequencing. Animals were divided into four treatment groups and RNA was subsequently recovered from liver samples. Four pooled RNA-seq libraries were sequenced to produce over 322 M reads totaling 13.8 Gb of sequence. Approximately 170,000 predicted transcripts were de novo assembled, of which 803 had significant differential expression in at least one pair-wise comparison between treatment groups. Functional analysis linked many of the transcripts significantly affected by AFB1 exposure to cancer, apoptosis, the cell cycle or lipid regulation. Most notable were transcripts from the genes encoding E3 ubiquitin-protein ligase Mdm2, osteopontin, S-adenosylmethionine synthase isoform type-2, and lipoprotein lipase. Expression was modulated by the probiotics, but treatment did not completely mitigate the effects of AFB1. Genes identified through transcriptome analysis provide candidates for further study of AFB1 toxicity and targets for efforts to improve the health of domestic turkeys exposed to AFB1.

  3. Mapping the CgrA regulon of Rhodospirillum centenum reveals a hierarchal network controlling Gram-negative cyst development.

    PubMed

    Dong, Qian; Fang, Mingxu; Roychowdhury, Sugata; Bauer, Carl E

    2015-12-16

    Several Gram-negative species undergo development leading to the formation of metabolically dormant desiccation resistant cysts. Recent analysis of cyst development has revealed that ~20 % of the Rhodospirillum centenum transcriptome undergo temporal changes in expression as cells transition from vegetative to cyst forms. It has also been established that one trigger for cyst formation is the synthesis of the signaling nucleotide 3', 5'- cyclic guanosine monophosphate (cGMP) that is sensed by a homolog of the catabolite repressor protein called CgrA. CgrA in the presence of cGMP initiate a cascade of gene expression leading to the development of cysts. In this study, we have used RNA-seq and chromatin immunoprecipitation (ChIP-Seq) techniques to define the CgrA-cGMP regulon. Our results indicate that disruption of CgrA leads to altered expression of 258 genes, 131 of which have been previously reported to be involved in cyst development. ChIP-seq analysis combined with transcriptome data also demonstrates that CgrA directly regulates the expression of numerous sigma factors and transcription factors several of which are known to be involved in cyst cell development. This analysis reveals the presence of CgrA binding sites upstream of many developmentally regulated genes including many transcription factors and signal transduction components. CgrA thus functions as master controller of the cyst development by initiating a hierarchal cascade of downstream transcription factors that induces temporal expression of encystment genes.

  4. Temporal transcriptome changes induced by methyl jasmonate in Salvia sclarea.

    PubMed

    Hao, Da Cheng; Chen, Shi Lin; Osbourn, Anne; Kontogianni, Vassiliki G; Liu, Li Wei; Jordán, Maria J

    2015-03-01

    Salvia sclarea is a traditional medicinal and aromatic plant that grows in Europe and produces various economically important compounds, including phenylpropanoid derivatives and terpenoids. Methyl jasmonate (MeJA) is commonly used to elicit plant stress responses. However, how MeJA enhances production of secondary metabolites in S. sclarea is not well understood. We performed a genome-wide analysis of temporal gene expression in S. sclarea leaves and roots. The transcriptome profiles 0, 10 and 26 h after MeJA treatment were analyzed by Illumina RNA-Seq. A total of 16,142 isogenes (average length 866bp; N50 1035bp) were obtained by de novo assembly of 35,757,567 raw sequencing reads. When these sequencing reads were mapped onto the assembled Unigenes, 3236, 2792 and 798 Unigenes were found to be expressed differentially between 0 and 10h, 0 and 26 h, and 10 and 26h, respectively. These included many secondary metabolite biosynthesis, stress and defense-related genes. A qRT-PCR analysis confirmed the expression profiles of selected differentially expressed genes (DEGs) revealed by RNA-Seq data, and also extended our analysis of differential gene expression to 73 h. Our investigations revealed temporal differences in the responses of S. sclarea to MeJA treatment. MeJA treatment induced the expression of a large number of genes involved in phenylpropanoid biosynthesis, especially between 0 and 10h, and 0 and 26 h. Additionally, many genes encoding transcription factors, cytochrome P450s, glycosyltransferases, methyltransferases and transporters were shown to respond to MeJA elicitation. DEGs related to structural molecule activity and cell death showed a significant temporal variation. A chromatographic analysis of metabolites at 26h, 73h and six days after MeJA treatment indicated that these transcriptomic changes precede MeJA-induced changes in secondary metabolite content. This study sheds light on the molecular mechanisms of MeJA elicitation and is helpful in understanding how exogenous MeJA treatment mediates extensive plant transcriptome reprogramming/remodeling. Our results can be utilized to characterize genes related to secondary metabolism and their regulation, and in breeding S. sclarea for desirable chemotypes. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. Combined Analysis of the Chloroplast Genome and Transcriptome of the Antarctic Vascular Plant Deschampsia antarctica Desv

    PubMed Central

    Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2014-01-01

    Background Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. Results The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. Conclusions We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome. PMID:24647560

  6. Combined analysis of the chloroplast genome and transcriptome of the Antarctic vascular plant Deschampsia antarctica Desv.

    PubMed

    Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2014-01-01

    Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5'- or 3'-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome.

  7. MicroRNA biogenesis pathway from the salmon louse (Caligus rogercresseyi): emerging role in delousing drug response.

    PubMed

    Valenzuela-Miranda, Diego; Nuñez-Acuña, Gustavo; Valenzuela-Muñoz, Valentina; Asgari, Sassan; Gallardo-Escárate, Cristian

    2015-01-25

    Despite the increasing evidence of the importance of microRNAs (miRNAs) in the regulation of multiple biological processes, the molecular bases supporting this regulation are still barely understood in crustaceans. Therefore, the molecular characterization and transcriptome modulation of the miRNA biogenesis pathway were evaluated in the salmon louse Caligus rogercresseyi, an ectoparasite that constitutes one of the biggest concerns for salmonid aquaculture industry. Hence, RNA-Seq analysis was conducted from six different developmental stages, and also after bioassays with delousing drugs Deltamethrin and Azamethiphos using adult individuals. In silico analysis evidenced 24 putative genes involved in the miRNA pathway such as biogenesis, transport, maturation and miRNA-target interaction. Moreover, 243 putative single nucleotide polymorphisms (SNPs) were identified, 15 of which showed non-synonym mutations. RNA-Seq analysis revealed that CCR4-Not complex subunit 3 (CNOT3) was upregulated at earlier developmental stages (nauplius I-II and copepodid), and also after the exposure to Azamethiphos, but not to Deltamethrin. In contrast, the subunit 7 (CNOT7) showed an inverse expression pattern. Different Argonaute transcripts were associated to chalimus and adult stages, revealing specific expression patterns in response to antiparasitic drugs. Our results suggest novel insights into the regulatory network of the post-transcriptional gene regulation in C. rogercresseyi mediated by miRNAs, evidencing a putative role during the ontogeny and drug response. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. CANEapp: a user-friendly application for automated next generation transcriptomic data analysis.

    PubMed

    Velmeshev, Dmitry; Lally, Patrick; Magistri, Marco; Faghihi, Mohammad Ali

    2016-01-13

    Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.

  9. Global transcriptional start site mapping using differential RNA sequencing reveals novel antisense RNAs in Escherichia coli.

    PubMed

    Thomason, Maureen K; Bischler, Thorsten; Eisenbart, Sara K; Förstner, Konrad U; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay; Sharma, Cynthia M; Storz, Gisela

    2015-01-01

    While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  10. ReadXplorer—visualization and analysis of mapped sequences

    PubMed Central

    Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

    2014-01-01

    Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157

  11. RNA-seq Transcriptome Analysis of Panax japonicus, and Its Comparison with Other Panax Species to Identify Potential Genes Involved in the Saponins Biosynthesis

    PubMed Central

    Rai, Amit; Yamazaki, Mami; Takahashi, Hiroki; Nakamura, Michimi; Kojoma, Mareshige; Suzuki, Hideyuki; Saito, Kazuki

    2016-01-01

    The Panax genus has been a source of natural medicine, benefitting human health over the ages, among which the Panax japonicus represents an important species. Our understanding of several key pathways and enzymes involved in the biosynthesis of ginsenosides, a pharmacologically active class of metabolites and a major chemical constituents of the rhizome extracts from the Panax species, are limited. Limited genomic information, and lack of studies on comparative transcriptomics across the Panax species have restricted our understanding of the biosynthetic mechanisms of these and many other important classes of phytochemicals. Herein, we describe Illumina based RNA sequencing analysis to characterize the transcriptome and expression profiles of genes expressed in the five tissues of P. japonicus, and its comparison with other Panax species. RNA sequencing and de novo transcriptome assembly for P. japonicus resulted in a total of 135,235 unigenes with 78,794 (58.24%) unigenes being annotated using NCBI-nr database. Transcriptome profiling, and gene ontology enrichment analysis for five tissues of P. japonicus showed that although overall processes were evenly conserved across all tissues. However, each tissue was characterized by several unique unigenes with the leaves showing the most unique unigenes among the tissues studied. A comparative analysis of the P. japonicus transcriptome assembly with publically available transcripts from other Panax species, namely, P. ginseng, P. notoginseng, and P. quinquefolius also displayed high sequence similarity across all Panax species, with P. japonicus showing highest similarity with P. ginseng. Annotation of P. japonicus transcriptome resulted in the identification of putative genes encoding all enzymes from the triterpene backbone biosynthetic pathways, and identified 24 and 48 unigenes annotated as cytochrome P450 (CYP) and glycosyltransferases (GT), respectively. These CYPs and GTs annotated unigenes were conserved across all Panax species and co-expressed with other the transcripts involved in the triterpenoid backbone biosynthesis pathways. Unigenes identified in this study represent strong candidates for being involved in the triterpenoid saponins biosynthesis, and can serve as a basis for future validation studies. PMID:27148308

  12. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

    PubMed

    Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

    2017-12-05

    Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.

  13. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

    PubMed

    Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M

    2012-04-05

    The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php.

  14. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

    PubMed Central

    2012-01-01

    Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via visual analysis across multiple levels of genomic resolution, linked searches and interaction with existing bioinformatics tools. We highlight the novel functionality of VESPA and core programming requirements for visualization of these large heterogeneous datasets for a client-side application. The software is freely available at https://www.biopilot.org/docs/Software/Vespa.php. PMID:22480257

  15. Spatial reconstruction of single-cell gene expression

    PubMed Central

    Satija, Rahul; Farrell, Jeffrey A.; Gennert, David; Schier, Alexander F.; Regev, Aviv

    2015-01-01

    Spatial localization is a key determinant of cellular fate and behavior, but spatial RNA assays traditionally rely on staining for a limited number of RNA species. In contrast, single-cell RNA-seq allows for deep profiling of cellular gene expression, but established methods separate cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos, inferring a transcriptome-wide map of spatial patterning. We confirmed Seurat’s accuracy using several experimental approaches, and used it to identify a set of archetypal expression patterns and spatial markers. Additionally, Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems. PMID:25867923

  16. An RNA-seq transcriptome analysis of floral buds of an interspecific Brassica hybrid between B. carinata and B. napus.

    PubMed

    Chu, Pu; Liu, Huijuan; Yang, Qing; Wang, Yankun; Yan, Guixia; Guan, Rongzhan

    2014-12-01

    Interspecific hybridizations promote gene transfer between species and play an important role in plant speciation and crop improvement. However, hybrid sterility that commonly found in the first generation of hybrids hinders the utilization of interspecific hybridization. The combination of divergent parental genomes can create extensive transcriptome variations, and to determine these gene expression alterations and their effects on hybrids, an interspecific Brassica hybrid of B. carinata × B. napus was generated. Scanning electron microscopy analysis indicated that some of the hybrid pollen grains were irregular in shape and exhibited abnormal exine patterns compared with those from the parents. Using the Illumina HiSeq 2000 platform, 39,598, 32,403 and 42,208 genes were identified in flower buds of B. carinata cv. W29, B. napus cv. Zhongshuang 11 and their hybrids, respectively. The differentially expressed genes were significantly enriched in pollen wall assembly, pollen exine formation, pollen development, pollen tube growth, pollination, gene transcription, macromolecule methylation and translation, which might be associated with impaired fertility in the F1 hybrid. These results will shed light on the mechanisms underlying the low fertility of the interspecific hybrids and expand our knowledge of interspecific hybridization.

  17. Gene expression profiling of human breast tissue samples using SAGE-Seq.

    PubMed

    Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley

    2010-12-01

    We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.

  18. De Novo transcriptome assembly of Zingiber officinale cv. Suruchi of Odisha.

    PubMed

    Gaur, Mahendra; Das, Aradhana; Sahoo, Rajesh Kumar; Kar, Basudeba; Nayak, Sanghamitra; Subudhi, Enketeswara

    2016-09-01

    Zingiber officinale Rosc., known as ginger, is an Asian crop, popularly used in every household kitchen and commercially used in bakery, beverage, food and pharmaceutical industries. The present study deals with de novo transcriptome assembly of an elite ginger cultivar Suruchi by next generation sequencing methodology. From the analysis 10.9 GB raw data was obtained which can be available in NCBI accession number SAMN03761185. We identified 41,969 transcripts using Trinity RNA-Seq from ginger rhizome of Suruchi variety from Odisha. The transcript length varied from 300 bp to 8404 bp with a total length of 3,96,40,526 bp and N50 of 1251 bp. To the best of our knowledge, this is the first transcriptome data of an elite ginger cultivar Suruchi released for Odisha state of India which will help molecular biologists to develop genetic markers for identification of cultivars.

  19. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    PubMed

    Villarino, Gonzalo H; Bombarely, Aureliano; Giovannoni, James J; Scanlon, Michael J; Mattson, Neil S

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  20. Transcriptomic Analysis of Petunia hybrida in Response to Salt Stress Using High Throughput RNA Sequencing

    PubMed Central

    Villarino, Gonzalo H.; Bombarely, Aureliano; Giovannoni, James J.; Scanlon, Michael J.; Mattson, Neil S.

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments. PMID:24722556

Top