Technical variations in low-input RNA-seq methodologies.
Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar
2014-01-14
Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.
Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.; ...
2016-07-06
Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.
Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Langevin, Stanley A.; Bent, Zachary W.; Solberg, Owen D.; Curtis, Deanna J.; Lane, Pamela D.; Williams, Kelly P.; Schoeniger, Joseph S.; Sinha, Anupama; Lane, Todd W.; Branda, Steven S.
2013-01-01
Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows. PMID:23558773
RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.
Wang, Yejun; Sun, Ming-An; White, Aaron P
2018-01-01
RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .
Langevin, Stanley A; Bent, Zachary W; Solberg, Owen D; Curtis, Deanna J; Lane, Pamela D; Williams, Kelly P; Schoeniger, Joseph S; Sinha, Anupama; Lane, Todd W; Branda, Steven S
2013-04-01
Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows.
Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.
Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin
2013-09-22
High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.
Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J
2015-09-03
RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
Quantifying circular RNA expression from RNA-seq data using model-based framework.
Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun-A; Zhou, Tong; Gu, Wanjun
2017-07-15
Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
eQTL Mapping Using RNA-seq Data
Hu, Yijuan
2012-01-01
As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping. PMID:23667399
Hong, Yoonki; Kim, Woo Jin; Bang, Chi Young; Lee, Jae Cheol; Oh, Yeon-Mok
2016-04-01
Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.
Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias
2015-06-25
Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Position-specific binding of FUS to nascent RNA regulates mRNA length
Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen
2015-01-01
More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189
2013-01-01
Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455
Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian
2013-11-09
The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq
Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng
2011-01-01
Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387
Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...
2015-03-12
The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
Rai, Muhammad Farooq; Tycksen, Eric D; Sandell, Linda J; Brophy, Robert H
2018-01-01
Microarrays and RNA-seq are at the forefront of high throughput transcriptome analyses. Since these methodologies are based on different principles, there are concerns about the concordance of data between the two techniques. The concordance of RNA-seq and microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed in clinically derived ligament tissues. To demonstrate the concordance between RNA-seq and microarrays and to assess potential benefits of RNA-seq over microarrays, we assessed differences in transcript expression in anterior cruciate ligament (ACL) tissues based on time-from-injury. ACL remnants were collected from patients with an ACL tear at the time of ACL reconstruction. RNA prepared from torn ACL remnants was subjected to Agilent microarrays (N = 24) and RNA-seq (N = 8). The correlation of biological replicates in RNA-seq and microarrays data was similar (0.98 vs. 0.97), demonstrating that each platform has high internal reproducibility. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarrays values were moderate. The cross-platform concordance for differentially expressed transcripts or enriched pathways was linearly correlated (r = 0.64). RNA-Seq was superior in detecting low abundance transcripts and differentiating biologically critical isoforms. Additional independent validation of transcript expression was undertaken using microfluidic PCR for selected genes. PCR data showed 100% concordance (in expression pattern) with RNA-seq and microarrays data. These findings demonstrate that RNA-seq has advantages over microarrays for transcriptome profiling of ligament tissues when available and affordable. Furthermore, these findings are likely transferable to other musculoskeletal tissues where tissue collection is challenging and cells are in low abundance. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:484-497, 2018. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
Reyes, Juan M; Chitwood, James L; Ross, Pablo J
2015-02-01
Molecular changes occurring during mammalian oocyte maturation are partly regulated by cytoplasmic polyadenylation (CP) and affect oocyte quality, yet the extent of CP activity during oocyte maturation remains unknown. Single bovine oocyte RNA sequencing (RNA-Seq) was performed to examine changes in transcript abundance during in vitro oocyte maturation in cattle. Polyadenylated RNA from individual germinal-vesicle and metaphase-II oocytes was amplified and processed for Illumina sequencing, producing approximately 30 million reads per replicate for each sample type. A total of 10,494 genes were found to be expressed, of which 2,455 were differentially expressed (adjusted P < 0.05 and fold change >2) between stages, with 503 and 1,952 genes respectively increasing and decreasing in abundance. Differentially expressed genes with complete 3'-untranslated-region sequence (279 increasing and 918 decreasing in polyadenylated transcript abundance) were examined for the presence, position, and distribution of motifs mediating CP, revealing enrichment (85%) and lack thereof (18%) in up- and down-regulated genes, respectively. Examination of total and polyadenylated RNA abundance by quantitative PCR validated these RNA-Seq findings. The observed increases in polyadenylated transcript abundance within the RNA-Seq data are likely due to CP, providing novel insight into targeted transcripts and resultant differential gene expression profiles that contribute to oocyte maturation. © 2015 Wiley Periodicals, Inc.
Optimizing exosomal RNA isolation for RNA-Seq analyses of archival sera specimens.
Prendergast, Emily N; de Souza Fonseca, Marcos Abraão; Dezem, Felipe Segato; Lester, Jenny; Karlan, Beth Y; Noushmehr, Houtan; Lin, Xianzhi; Lawrenson, Kate
2018-01-01
Exosomes are endosome-derived membrane vesicles that contain proteins, lipids, and nucleic acids. The exosomal transcriptome mediates intercellular communication, and represents an understudied reservoir of novel biomarkers for human diseases. Next-generation sequencing enables complex quantitative characterization of exosomal RNAs from diverse sources. However, detailed protocols describing exosome purification for preparation of exosomal RNA-sequence (RNA-Seq) libraries are lacking. Here we compared methods for isolation of exosomes and extraction of exosomal RNA from human cell-free serum, as well as strategies for attaining equal representation of samples within pooled RNA-Seq libraries. We compared commercial precipitation with ultracentrifugation for exosome purification and confirmed the presence of exosomes via both transmission electron microscopy and immunoblotting. Exosomal RNA extraction was compared using four different RNA purification methods. We determined the minimal starting volume of serum required for exosome preparation and showed that high quality exosomal RNA can be isolated from sera stored for over a decade. Finally, RNA-Seq libraries were successfully prepared with exosomal RNAs extracted from human cell-free serum, cataloguing both coding and non-coding exosomal transcripts. This method provides researchers with strategic options to prepare RNA-Seq libraries and compare RNA-Seq data quantitatively from minimal volumes of fresh and archival human cell-free serum for disease biomarker discovery.
Tn5Prime, a Tn5 based 5' capture method for single cell RNA-seq.
Cole, Charles; Byrne, Ashley; Beaudin, Anna E; Forsberg, E Camilla; Vollmers, Christopher
2018-06-01
RNA-sequencing (RNA-seq) is a powerful technique to investigate and quantify entire transcriptomes. Recent advances in the field have made it possible to explore the transcriptomes of single cells. However, most widely used RNA-seq protocols fail to provide crucial information regarding transcription start sites. Here we present a protocol, Tn5Prime, that takes advantage of the Tn5 transposase-based Smart-seq2 protocol to create RNA-seq libraries that capture the 5' end of transcripts. The Tn5Prime method dramatically streamlines the 5' capture process and is both cost effective and reliable. By applying Tn5Prime to bulk RNA and single cell samples, we were able to define transcription start sites as well as quantify transcriptomes at high accuracy and reproducibility. Additionally, similar to 3' end-based high-throughput methods like Drop-seq and 10× Genomics Chromium, the 5' capture Tn5Prime method allows the introduction of cellular identifiers during reverse transcription, simplifying the analysis of large numbers of single cells. In contrast to 3' end-based methods, Tn5Prime also enables the assembly of the variable 5' ends of the antibody sequences present in single B-cell data. Therefore, Tn5Prime presents a robust tool for both basic and applied research into the adaptive immune system and beyond.
MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.
Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar
2013-10-15
High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S
2012-01-01
RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.
Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk
2014-03-01
RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net
Han, Kook; Tjaden, Brian; Lory, Stephen
2016-12-22
The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique, referred to as global small non-coding RNA target identification by ligation and sequencing (GRIL-seq), is based on preferential ligation of sRNAs to the ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimaeras. In addition to the RNA chaperone Hfq, the GRIL-seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrated that direct regulatory targets of this sRNA can readily be identified. Therefore, GRIL-seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but also for uncovering novel roles for sRNAs and their targets in complex regulatory networks.
IAOseq: inferring abundance of overlapping genes using RNA-seq data.
Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue
2015-01-01
Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Han, Kook; Tjaden, Brian; Lory, Stephen
2017-01-01
The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base-pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique (referred to as GRIL-Seq) is based on preferential ligation of sRNAs to ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimeras. In addition to the RNA chaperone Hfq, the GRIL-Seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrate that direct regulatory targets of this sRNA can be readily identified. Therefore, GRIL-Seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but can also result in uncovering novel roles for sRNAs and their targets in complex regulatory networks. PMID:28005055
RNA-Skim: a rapid method for RNA-Seq quantification at transcript level
Zhang, Zhaojun; Wang, Wei
2014-01-01
Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: weiwang@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931995
Network embedding-based representation learning for single cell RNA-seq data.
Li, Xiangyu; Chen, Weizheng; Chen, Yang; Zhang, Xuegong; Gu, Jin; Zhang, Michael Q
2017-11-02
Single cell RNA-seq (scRNA-seq) techniques can reveal valuable insights of cell-to-cell heterogeneities. Projection of high-dimensional data into a low-dimensional subspace is a powerful strategy in general for mining such big data. However, scRNA-seq suffers from higher noise and lower coverage than traditional bulk RNA-seq, hence bringing in new computational difficulties. One major challenge is how to deal with the frequent drop-out events. The events, usually caused by the stochastic burst effect in gene transcription and the technical failure of RNA transcript capture, often render traditional dimension reduction methods work inefficiently. To overcome this problem, we have developed a novel Single Cell Representation Learning (SCRL) method based on network embedding. This method can efficiently implement data-driven non-linear projection and incorporate prior biological knowledge (such as pathway information) to learn more meaningful low-dimensional representations for both cells and genes. Benchmark results show that SCRL outperforms other dimensional reduction methods on several recent scRNA-seq datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs
LeGault, Laura H.; Dewey, Colin N.
2013-01-01
Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23846746
Cell fixation and preservation for droplet-based single-cell transcriptomics.
Alles, Jonathan; Karaiskos, Nikos; Praktiknjo, Samantha D; Grosswendt, Stefanie; Wahle, Philipp; Ruffault, Pierre-Louis; Ayoub, Salah; Schreyer, Luisa; Boltengagen, Anastasiya; Birchmeier, Carmen; Zinzen, Robert; Kocks, Christine; Rajewsky, Nikolaus
2017-05-19
Recent developments in droplet-based microfluidics allow the transcriptional profiling of thousands of individual cells in a quantitative, highly parallel and cost-effective way. A critical, often limiting step is the preparation of cells in an unperturbed state, not altered by stress or ageing. Other challenges are rare cells that need to be collected over several days or samples prepared at different times or locations. Here, we used chemical fixation to address these problems. Methanol fixation allowed us to stabilise and preserve dissociated cells for weeks without compromising single-cell RNA sequencing data. By using mixtures of fixed, cultured human and mouse cells, we first showed that individual transcriptomes could be confidently assigned to one of the two species. Single-cell gene expression from live and fixed samples correlated well with bulk mRNA-seq data. We then applied methanol fixation to transcriptionally profile primary cells from dissociated, complex tissues. Low RNA content cells from Drosophila embryos, as well as mouse hindbrain and cerebellum cells prepared by fluorescence-activated cell sorting, were successfully analysed after fixation, storage and single-cell droplet RNA-seq. We were able to identify diverse cell populations, including neuronal subtypes. As an additional resource, we provide 'dropbead', an R package for exploratory data analysis, visualization and filtering of Drop-seq data. We expect that the availability of a simple cell fixation method will open up many new opportunities in diverse biological contexts to analyse transcriptional dynamics at single-cell resolution.
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.
Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima
2017-01-01
Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).
Vignali, Marissa; Armour, Christopher D; Chen, Jingyang; Morrison, Robert; Castle, John C; Biery, Matthew C; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K; Duffy, Patrick E
2011-03-01
Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases.
Vignali, Marissa; Armour, Christopher D.; Chen, Jingyang; Morrison, Robert; Castle, John C.; Biery, Matthew C.; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K.; Duffy, Patrick E.
2011-01-01
Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases. PMID:21317536
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.
Van den Berge, Koen; Perraudeau, Fanny; Soneson, Charlotte; Love, Michael I; Risso, Davide; Vert, Jean-Philippe; Robinson, Mark D; Dudoit, Sandrine; Clement, Lieven
2018-02-26
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.
Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B
2016-02-04
Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Gluck, Christian; Min, Sangwon; Oyelakin, Akinsola; Smalley, Kirsten; Sinha, Satrajit; Romano, Rose-Anne
2016-11-16
Mouse models have served a valuable role in deciphering various facets of Salivary Gland (SG) biology, from normal developmental programs to diseased states. To facilitate such studies, gene expression profiling maps have been generated for various stages of SG organogenesis. However these prior studies fall short of capturing the transcriptional complexity due to the limited scope of gene-centric microarray-based technology. Compared to microarray, RNA-sequencing (RNA-seq) offers unbiased detection of novel transcripts, broader dynamic range and high specificity and sensitivity for detection of genes, transcripts, and differential gene expression. Although RNA-seq data, particularly under the auspices of the ENCODE project, have covered a large number of biological specimens, studies on the SG have been lacking. To better appreciate the wide spectrum of gene expression profiles, we isolated RNA from mouse submandibular salivary glands at different embryonic and adult stages. In parallel, we processed RNA-seq data for 24 organs and tissues obtained from the mouse ENCODE consortium and calculated the average gene expression values. To identify molecular players and pathways likely to be relevant for SG biology, we performed functional gene enrichment analysis, network construction and hierarchal clustering of the RNA-seq datasets obtained from different stages of SG development and maturation, and other mouse organs and tissues. Our bioinformatics-based data analysis not only reaffirmed known modulators of SG morphogenesis but revealed novel transcription factors and signaling pathways unique to mouse SG biology and function. Finally we demonstrated that the unique SG gene signature obtained from our mouse studies is also well conserved and can demarcate features of the human SG transcriptome that is different from other tissues. Our RNA-seq based Atlas has revealed a high-resolution cartographic view of the dynamic transcriptomic landscape of the mouse SG at various stages. These RNA-seq datasets will complement pre-existing microarray based datasets, including the Salivary Gland Molecular Anatomy Project by offering a broader systems-biology based perspective rather than the classical gene-centric view. Ultimately such resources will be valuable in providing a useful toolkit to better understand how the diverse cell population of the SG are organized and controlled during development and differentiation.
SeqTU: A web server for identification of bacterial transcription units
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Xin; Chou, Wen -Chi; Ma, Qin
A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
SeqTU: A web server for identification of bacterial transcription units
Chen, Xin; Chou, Wen -Chi; Ma, Qin; ...
2017-03-07
A transcription unit (TU) consists of K ≥ 1 consecutive genes on the same strand of a bacterial genome that are transcribed into a single mRNA molecule under certain conditions. Their identification is an essential step in elucidation of transcriptional regulatory networks. We have recently developed a machine-learning method to accurately identify TUs from RNA-seq data, based on two features of the assembled RNA reads: the continuity and stability of RNA-seq coverage across a genomic region. While good performance was achieved by the method on Escherichia coli and Clostridium thermocellum, substantial work is needed to make the program generally applicablemore » to all bacteria, knowing that the program requires organism specific information. A web server, named SeqTU, was developed to automatically identify TUs with given RNA-seq data of any bacterium using a machine-learning approach. The server consists of a number of utility tools, in addition to TU identification, such as data preparation, data quality check and RNA-read mapping. SeqTU provides a user-friendly interface and automated prediction of TUs from given RNA-seq data. Furthermore, the predicted TUs are displayed intuitively using HTML format along with a graphic visualization of the prediction.« less
Hsu, Ju-Chun; Lin, Yu-Yu; Chang, Chia-Che; Hua, Kuo-Hsun; Chen, Mei-Ju May; Huang, Li-Hsin; Chen, Chien-Yu
2016-04-22
Pesticide resistance poses many challenges for pest control, particularly for destructive pests such as diamondback moths (Plutella xylostella). Organophosphates have been used in the field since the 1950s, leading to selection for resistance-related gene variants and the development of resistance to new insecticides in the diamondback moth. Identifying actual and potential genes involved in resistance could offer solutions for control. This study established resistant diamondback moth strains from two different collections using mevinphos. Two sets of transcriptome sequencing (RNA-Seq) data were generated for pairs of mevinphos-resistant versus susceptible (wild-type) strains. One susceptible strain containing 14 giga base pairs was assembled into a reference-based assembly using published scaffold sequences as reference. Differential expression data between resistant and susceptible strains revealed 944 transcripts (803 with annotations) showing upregulation and 427 transcripts (150 with annotations) showing downregulation. Around 6.8% of the differential expression transcripts (65) could be categorized as associated with well-known resistance mechanisms such as penetration, detoxification, and behavior response; of these 65 transcripts, 38 showed upregulation, and 12 relating to penetration were upregulated when the transcripts of 19 cytochrome P450s, 2 zeta-class glutathione S-transferases, and 4 ATP-binding cassette transporters showed upregulation. In addition, 11 groups of transcripts related to olfactory perception appeared to be downregulated in trade-off situations. Quantitative polymerase chain reaction expression results were consistent with RNA-Seq data. Possible roles of these differentially expressed genes in resistance mechanisms are discussed in this study. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
RNA-Seq and UHPLC-Q-TOF/MS Based Lipidomics Study in Lysiphlebia japonica.
Gao, Xueke; Luo, Junyu; Lü, Limin; Zhang, LiJuan; Zhang, Shuai; Cui, Jinjie
2018-05-17
Lipids play an important role in energy storage, membrane structure stabilization and signaling. Parasitoids are excellent models to study lipidomics because a majority of them do not accumulate during their free-living life-stage. Studies on parasitoids have mostly focused on the changes in the lipids and gene transcripts in hosts and little attention has been devoted to lipidomics and transcriptomics changes in parasitoids. In this study, a relative quantitative analysis of lipids and their gene transcripts in 3-days-old Lysiphlebia japonica larva (3 days after spawning) and pupae were performed using liquid chromatography, mass spectrometry and RNA-seq. Thirty-three glycerolipids and 250 glycerophospholipids were identified in this study; all triglycerides and the vast majority of phospholipids accumulated in the pupal stage. This was accompanied by differentially regulated lipid uptake and remolding. Furthermore, our data showed that gene transcription was up-regulated in key nutrient metabolic pathways involved in lipid synthesis in 3-days-old larvae. Finally, our data suggests that larva and pupa of L. japonica may lack the ability for fatty acids synthesis. A comprehensive, quantitative, and expandable resource was provided for further studies of metabolic regulation and molecular mechanisms underlying parasitic response to hosts defense.
An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.
Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D
2017-01-01
We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.
Wang, Yejun; MacKenzie, Keith D; White, Aaron P
2015-05-07
As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for comparative analyses with other Salmonella serotypes.
Comparison of alternative approaches for analysing multi-level RNA-seq data
Mohorianu, Irina; Bretman, Amanda; Smith, Damian T.; Fowler, Emily K.; Dalmay, Tamas
2017-01-01
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. PMID:28792517
iSeq: Web-Based RNA-seq Data Analysis and Visualization.
Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng
2018-01-01
Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research
Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-01-01
Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.
Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-05-01
RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
2012-01-01
Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated. PMID:23016559
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data
Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J.; Intarapanich, Apichart; Tongsima, Sissades
2017-01-01
Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER). PMID:28542466
Nascent-Seq reveals novel features of mouse circadian transcriptional regulation
Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael
2012-01-01
A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795
Molinie, Benoit; Wang, Jinkai; Lim, Kok-Seong; Hillebrand, Roman; Lu, Zhi-xiang; Van Wittenberghe, Nicholas; Howard, Benjamin D.; Daneshvar, Kaveh; Mullen, Alan C.; Dedon, Peter
2017-01-01
N6-Methyladenosine (m6A) is a widespread, reversible chemical modification of RNA molecules, implicated in many aspects of RNA metabolism. Little quantitative information exists as to either how many transcript copies of particular genes are m6A modified (‘m6A levels’) or the relationship of m6A modification(s) to alternative RNA isoforms. To deconvolute the m6A epitranscriptome, we developed m6A-level and isoform-characterization sequencing (m6A-LAIC-seq). We found that cells exhibit a broad range of nonstoichiometric m6A levels with cell-type specificity. At the level of isoform characterization, we discovered widespread differences in the use of tandem alternative polyadenylation (APA) sites by methylated and nonmethylated transcript isoforms of individual genes. Strikingly, there is a strong bias for methylated transcripts to be coupled with proximal APA sites, resulting in shortened 3′ untranslated regions, while nonmethylated transcript isoforms tend to use distal APA sites. m6A-LAIC-seq yields a new perspective on transcriptome complexity and links APA usage to m6A modifications. PMID:27376769
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.
D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano
2015-01-01
The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi
2018-02-12
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Li, Dan; Gaedigk, Roger; Hart, Steven N.; Leeder, J. Steven
2012-01-01
Cytochrome P450 3A4 (CYP3A4) metabolizes more than 50% of prescribed drugs. The expression of CYP3A4 changes during liver development and may be affected by the administration of some drugs. Alternative mRNA transcripts occur in more than 90% of human genes and are frequently observed in cells responding to developmental and environmental signals. Different mRNA transcripts may encode functionally distinct proteins or contribute to variability of mRNA stability or protein translation efficiency. The purpose of this study was to examine expression of alternative CYP3A4 mRNA transcripts in hepatocytes in response to developmental signals and drugs. cDNA cloning and RNA sequencing (RNA-Seq) were used to identify CYP3A4 mRNA transcripts. Three transcripts were found in HepaRG cells and liver tissues: one represented a canonical mRNA with full-length 3′-untranslated region (UTR), one had a shorter 3′-UTR, and one contained partial intron-6 retention. The alternative mRNA transcripts were validated by either rapid amplification of cDNA 3′-end or endpoint polymerase chain reaction (PCR). Quantification of the transcripts by RNA-Seq and real time quantitative PCR revealed that the CYP3A4 transcript with shorter 3′-UTR was preferentially expressed in developed livers, differentiated hepatocytes, and in rifampicin- and phenobarbital-induced hepatocytes. The CYP3A4 transcript with shorter 3′-UTR was more stable and produced more protein compared with the CYP3A4 transcript with canonical 3′-UTR. We conclude that the 3′-end processing of CYP3A4 contributes to the quantitative regulation of CYP3A4 gene expression through alternative polyadenylation, which may serve as a regulatory mechanism explaining changes of CYP3A4 expression and activity during hepatocyte differentiation and liver development and in response to drug induction. PMID:21998292
2014-01-01
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838
Tripathi, Kumar Parijat; Evangelista, Daniela; Zuccaro, Antonio; Guarracino, Mario Rosario
2015-01-01
RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: http://www-labgtp.na.icar.cnr.it/Transcriptator.
Nascent Transcription Affected by RNA Polymerase IV in Zea mays
Erhard, Karl F.; Talbot, Joy-El R. B.; Deans, Natalie C.; McClish, Allison E.; Hollick, Jay B.
2015-01-01
All eukaryotes use three DNA-dependent RNA polymerases (RNAPs) to create cellular RNAs from DNA templates. Plants have additional RNAPs related to Pol II, but their evolutionary role(s) remain largely unknown. Zea mays (maize) RNA polymerase D1 (RPD1), the largest subunit of RNA polymerase IV (Pol IV), is required for normal plant development, paramutation, transcriptional repression of certain transposable elements (TEs), and transcriptional regulation of specific alleles. Here, we define the nascent transcriptomes of rpd1 mutant and wild-type (WT) seedlings using global run-on sequencing (GRO-seq) to identify the broader targets of RPD1-based regulation. Comparisons of WT and rpd1 mutant GRO-seq profiles indicate that Pol IV globally affects transcription at both transcriptional start sites and immediately downstream of polyadenylation addition sites. We found no evidence of divergent transcription from gene promoters as seen in mammalian GRO-seq profiles. Statistical comparisons identify genes and TEs whose transcription is affected by RPD1. Most examples of significant increases in genic antisense transcription appear to be initiated by 3ʹ-proximal long terminal repeat retrotransposons. These results indicate that maize Pol IV specifies Pol II-based transcriptional regulation for specific regions of the maize genome including genes having developmental significance. PMID:25653306
Chakraborty, Sutirtha
2018-05-26
RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.
Wiegand, Sandra; Dietrich, Sascha; Hertel, Robert; Bongaerts, Johannes; Evers, Stefan; Volland, Sonja; Daniel, Rolf; Liesegang, Heiko
2013-10-01
The production of enzymes by an industrial strain requires a complex adaption of the bacterial metabolism to the conditions within the fermenter. Regulatory events within the process result in a dynamic change of the transcriptional activity of the genome. This complex network of genes is orchestrated by proteins as well as regulatory RNA elements. Here we present an RNA-Seq based study considering selected phases of an industry-oriented fermentation of Bacillus licheniformis. A detailed analysis of 20 strand-specific RNA-Seq datasets revealed a multitude of transcriptionally active genomic regions. 3314 RNA features encoded by such active loci have been identified and sorted into ten functional classes. The identified sequences include the expected RNA features like housekeeping sRNAs, metabolic riboswitches and RNA switches well known from studies on Bacillus subtilis as well as a multitude of completely new candidates for regulatory RNAs. An unexpectedly high number of 855 RNA features are encoded antisense to annotated protein and RNA genes, in addition to 461 independently transcribed small RNAs. These antisense transcripts contain molecules with a remarkable size range variation from 38 to 6348 base pairs in length. The genome of the type strain B. licheniformis DSM13 was completely reannotated using data obtained from RNA-Seq analyses and from public databases. The hereby generated data-sets represent a solid amount of knowledge on the dynamic transcriptional activities during the investigated fermentation stages. The identified regulatory elements enable research on the understanding and the optimization of crucial metabolic activities during a productive fermentation of Bacillus licheniformis strains.
Petrova, Olga E.; Garcia-Alcalde, Fernando; Zampaloni, Claudia; Sauer, Karin
2017-01-01
Global transcriptomic analysis via RNA-seq is often hampered by the high abundance of ribosomal (r)RNA in bacterial cells. To remove rRNA and enrich coding sequences, subtractive hybridization procedures have become the approach of choice prior to RNA-seq, with their efficiency varying in a manner dependent on sample type and composition. Yet, despite an increasing number of RNA-seq studies, comparative evaluation of bacterial rRNA depletion methods has remained limited. Moreover, no such study has utilized RNA derived from bacterial biofilms, which have potentially higher rRNA:mRNA ratios and higher rRNA carryover during RNA-seq analysis. Presently, we evaluated the efficiency of three subtractive hybridization-based kits in depleting rRNA from samples derived from biofilm, as well as planktonic cells of the opportunistic human pathogen Pseudomonas aeruginosa. Our results indicated different rRNA removal efficiency for the three procedures, with the Ribo-Zero kit yielding the highest degree of rRNA depletion, which translated into enhanced enrichment of non-rRNA transcripts and increased depth of RNA-seq coverage. The results indicated that, in addition to improving RNA-seq sensitivity, efficient rRNA removal enhanced detection of low abundance transcripts via qPCR. Finally, we demonstrate that the Ribo-Zero kit also exhibited the highest efficiency when P. aeruginosa/Staphylococcus aureus co-culture RNA samples were tested. PMID:28117413
Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun
2016-01-01
Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.
2013-01-01
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943
Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A
2017-01-01
RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.
Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai
2015-10-16
To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.
RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.
Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K
2014-10-01
RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.
Transcription profile of boar spermatozoa as revealed by RNA-sequencing
USDA-ARS?s Scientific Manuscript database
High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...
High-throughput detection of RNA processing in bacteria.
Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L
2018-03-27
Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome
Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.
2011-01-01
Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880
Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P
2015-03-11
The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Yang, Jian-Hua; Li, Jun-Hao; Jiang, Shan; Zhou, Hui; Qu, Liang-Hu
2013-01-01
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) represent two classes of important non-coding RNAs in eukaryotes. Although these non-coding RNAs have been implicated in organismal development and in various human diseases, surprisingly little is known about their transcriptional regulation. Recent advances in chromatin immunoprecipitation with next-generation DNA sequencing (ChIP-Seq) have provided methods of detecting transcription factor binding sites (TFBSs) with unprecedented sensitivity. In this study, we describe ChIPBase (http://deepbase.sysu.edu.cn/chipbase/), a novel database that we have developed to facilitate the comprehensive annotation and discovery of transcription factor binding maps and transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. The current release of ChIPBase includes high-throughput sequencing data that were generated by 543 ChIP-Seq experiments in diverse tissues and cell lines from six organisms. By analysing millions of TFBSs, we identified tens of thousands of TF-lncRNA and TF-miRNA regulatory relationships. Furthermore, two web-based servers were developed to annotate and discover transcriptional regulatory relationships of lncRNAs and miRNAs from ChIP-Seq data. In addition, we developed two genome browsers, deepView and genomeView, to provide integrated views of multidimensional data. Moreover, our web implementation supports diverse query types and the exploration of TFs, lncRNAs, miRNAs, gene ontologies and pathways.
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application
2015-01-01
Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471
Chen, Ziyi; Quan, Lijun; Huang, Anfei; Zhao, Qiang; Yuan, Yao; Yuan, Xuye; Shen, Qin; Shang, Jingzhe; Ben, Yinyin; Qin, F Xiao-Feng; Wu, Aiping
2018-01-01
The RNA sequencing approach has been broadly used to provide gene-, pathway-, and network-centric analyses for various cell and tissue samples. However, thus far, rich cellular information carried in tissue samples has not been thoroughly characterized from RNA-Seq data. Therefore, it would expand our horizons to better understand the biological processes of the body by incorporating a cell-centric view of tissue transcriptome. Here, a computational model named seq-ImmuCC was developed to infer the relative proportions of 10 major immune cells in mouse tissues from RNA-Seq data. The performance of seq-ImmuCC was evaluated among multiple computational algorithms, transcriptional platforms, and simulated and experimental datasets. The test results showed its stable performance and superb consistency with experimental observations under different conditions. With seq-ImmuCC, we generated the comprehensive landscape of immune cell compositions in 27 normal mouse tissues and extracted the distinct signatures of immune cell proportion among various tissue types. Furthermore, we quantitatively characterized and compared 18 different types of mouse tumor tissues of distinct cell origins with their immune cell compositions, which provided a comprehensive and informative measurement for the immune microenvironment inside tumor tissues. The online server of seq-ImmuCC are freely available at http://wap-lab.org:3200/immune/.
Chromatin-associated RNA sequencing (ChAR-seq) maps genome-wide RNA-to-DNA contacts
Jukam, David; Teran, Nicole A; Risca, Viviana I; Smith, Owen K; Johnson, Whitney L; Skotheim, Jan M; Greenleaf, William James
2018-01-01
RNA is a critical component of chromatin in eukaryotes, both as a product of transcription, and as an essential constituent of ribonucleoprotein complexes that regulate both local and global chromatin states. Here, we present a proximity ligation and sequencing method called Chromatin-Associated RNA sequencing (ChAR-seq) that maps all RNA-to-DNA contacts across the genome. Using Drosophila cells, we show that ChAR-seq provides unbiased, de novo identification of targets of chromatin-bound RNAs including nascent transcripts, chromosome-specific dosage compensation ncRNAs, and genome-wide trans-associated RNAs involved in co-transcriptional RNA processing. PMID:29648534
Introduction to Single-Cell RNA Sequencing.
Olsen, Thale Kristin; Baryawno, Ninib
2018-04-01
During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Wang, Wenlan; Xue, Li; Li, Ya; Li, Rong; Xie, Xiaoping; Bao, Junxiang; Hai, Chunxu; Li, Jinsheng
2016-01-01
To elucidate the altered gene network in the brains of carbon monoxide (CO) poisoned rats after treatment with hyperbaric oxygen (HBO₂). RNA sequencing (RNA-seq) analysis was performed to examine differentially expressed genes (DEGs) in brain tissue samples from nine male rats: a normal control group; a CO poisoning group; and an HBO₂ treatment group (three rats/group). Reverse transcription polymerase chain reaction (RT-PCR) and real-time quantitative PCR were used for validation of the DEGs in another 18 male rats (six rats/group). RNA-seq revealed that two genes were upregulated (4.18 and 8.76 log to the base 2 fold change) (p⟨0.05) in the CO-poisoned rats relative to the control rats; two genes were upregulated (3.88 and 7.69 log to the base 2 fold change); and 23 genes were downregulated (3.49-15.12 log to the base 2 fold change) (p⟨0.05) in the brains of the HBO₂-treated rats relative to the CO-poisoned rats. Target prediction of DEGs by gene network analysis and analysis of pathways affected suggested that regulation of gene expressions of dopamine metabolism and nitric oxide (NO) synthesis were significantly affected by CO poisoning and HBO₂ treatment. Results of RT-PCR and real-time quantitative PCR indicated that four genes (Pomc, GH-1, Pr1 and Fshβ) associated with hormone secretion in the hypothalamic-pituitary system have potential as markers for prognosis of CO. This study is the first RNA-seq analysis profile of HBO₂ treatment on rats with acute CO poisoning. It concludes that changes of hormone secretion in the hypothalamic-pituitary system, dopamine metabolism and NO synthesis involved in brain damage and behavior abnormalities after CO poisoning and HBO₂ therapy may regulate these changes.
Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang
2015-01-01
Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants’ growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., ‘Photosynthesis’), GO terms (e.g., ‘response to karrikin’) and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology. PMID:25901577
Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang
2015-01-01
Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants' growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., 'Photosynthesis'), GO terms (e.g., 'response to karrikin') and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology.
Pervasive, Genome-Wide Transcription in the Organelle Genomes of Diverse Plastid-Bearing Protists.
Sanitá Lima, Matheus; Smith, David Roy
2017-11-06
Organelle genomes are among the most sequenced kinds of chromosome. This is largely because they are small and widely used in molecular studies, but also because next-generation sequencing technologies made sequencing easier, faster, and cheaper. However, studies of organelle RNA have not kept pace with those of DNA, despite huge amounts of freely available eukaryotic RNA-sequencing (RNA-seq) data. Little is known about organelle transcription in nonmodel species, and most of the available eukaryotic RNA-seq data have not been mined for organelle transcripts. Here, we use publicly available RNA-seq experiments to investigate organelle transcription in 30 diverse plastid-bearing protists with varying organelle genomic architectures. Mapping RNA-seq data to organelle genomes revealed pervasive, genome-wide transcription, regardless of the taxonomic grouping, gene organization, or noncoding content. For every species analyzed, transcripts covered ≥85% of the mitochondrial and/or plastid genomes (all of which were ≤105 kb), indicating that most of the organelle DNA-coding and noncoding-is transcriptionally active. These results follow earlier studies of model species showing that organellar transcription is coupled and ubiquitous across the genome, requiring significant downstream processing of polycistronic transcripts. Our findings suggest that noncoding organelle DNA can be transcriptionally active, raising questions about the underlying function of these transcripts and underscoring the utility of publicly available RNA-seq data for recovering complete genome sequences. If pervasive transcription is also found in bigger organelle genomes (>105 kb) and across a broader range of eukaryotes, this could indicate that noncoding organelle RNAs are regulating fundamental processes within eukaryotic cells. Copyright © 2017 Sanitá Lima and Smith.
SC3 - consensus clustering of single-cell RNA-Seq data
Kiselev, Vladimir Yu.; Kirschner, Kristina; Schaub, Michael T.; Andrews, Tallulah; Yiu, Andrew; Chandra, Tamir; Natarajan, Kedar N; Reik, Wolf; Barahona, Mauricio; Green, Anthony R; Hemberg, Martin
2017-01-01
Single-cell RNA-seq (scRNA-seq) enables a quantitative cell-type characterisation based on global transcriptome profiles. We present Single-Cell Consensus Clustering (SC3), a user-friendly tool for unsupervised clustering which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach. We demonstrate that SC3 is capable of identifying subclones based on the transcriptomes from neoplastic cells collected from patients. PMID:28346451
Prakash, Celine; Haeseler, Arndt Von
2017-03-01
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
Haeseler, Arndt Von
2017-01-01
Abstract RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. PMID:27661099
RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.
Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R
2013-01-01
Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq's capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd
2013-01-01
An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
Serin, Elise A. R.; Snoek, L. B.; Nijveen, Harm; Willems, Leo A. J.; Jiménez-Gómez, Jose M.; Hilhorst, Henk W. M.; Ligterink, Wilco
2017-01-01
High-density genetic maps are essential for high resolution mapping of quantitative traits. Here, we present a new genetic map for an Arabidopsis Bayreuth × Shahdara recombinant inbred line (RIL) population, built on RNA-seq data. RNA-seq analysis on 160 RILs of this population identified 30,049 single-nucleotide polymorphisms (SNPs) covering the whole genome. Based on a 100-kbp window SNP binning method, 1059 bin-markers were identified, physically anchored on the genome. The total length of the RNA-seq genetic map spans 471.70 centimorgans (cM) with an average marker distance of 0.45 cM and a maximum marker distance of 4.81 cM. This high resolution genotyping revealed new recombination breakpoints in the population. To highlight the advantages of such high-density map, we compared it to two publicly available genetic maps for the same population, comprising 69 PCR-based markers and 497 gene expression markers derived from microarray data, respectively. In this study, we show that SNP markers can effectively be derived from RNA-seq data. The new RNA-seq map closes many existing gaps in marker coverage, saturating the previously available genetic maps. Quantitative trait locus (QTL) analysis for published phenotypes using the available genetic maps showed increased QTL mapping resolution and reduced QTL confidence interval using the RNA-seq map. The new high-density map is a valuable resource that facilitates the identification of candidate genes and map-based cloning approaches. PMID:29259624
Li, Shan; Dong, Xia; Su, Zhengchang
2013-07-30
Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.
2013-01-01
Background Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. Results To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. Conclusions As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads. PMID:23899370
Zhang, Yunzeng; Barthe, Gary; Grosser, Jude W; Wang, Nian
2016-07-08
Citrus blight is a citrus tree overall decline disease and causes serious losses in the citrus industry worldwide. Although it was described more than one hundred years ago, its causal agent remains unknown and its pathophysiology is not well determined, which hampers our understanding of the disease and design of suitable disease management. In this study, we sequenced and assembled the draft genome for Swingle citrumelo, one important citrus rootstock. The draft genome is approximately 280 Mb, which covers 74 % of the estimated Swingle citrumelo genome and the average coverage is around 15X. The draft genome of Swingle citrumelo enabled us to conduct transcriptome analysis of roots of blight and healthy Swingle citrumelo using RNA-seq. The RNA-seq was reliable as evidenced by the high consistence of RNA-seq analysis and quantitative reverse transcription PCR results (R(2) = 0.966). Comparison of the gene expression profiles between blight and healthy root samples revealed the molecular mechanism underneath the characteristic blight phenotypes including decline, starch accumulation, and drought stress. The JA and ET biosynthesis and signaling pathways showed decreased transcript abundance, whereas SA-mediated defense-related genes showed increased transcript abundance in blight trees, suggesting unclassified biotrophic pathogen was involved in this disease. Overall, the Swingle citrumelo draft genome generated in this study will advance our understanding of plant biology and contribute to the citrus breeding. Transcriptome analysis of blight and healthy trees deepened our understanding of the pathophysiology of citrus blight.
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai
2012-02-15
RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...
2015-10-28
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.
Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin
2011-03-24
The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing
2011-01-01
Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219
Tannir, Nizar M.; Williams, Michelle D.; Chen, Yunxin; Yao, Hui; Zhang, Jianping; Thompson, Erika J.; Meric-Bernstam, Funda; Medeiros, L. Jeffrey; Weinstein, John N.
2013-01-01
Elucidation of tumor-DNA virus associations in many cancer types has enhanced our knowledge of fundamental oncogenesis mechanisms and provided a basis for cancer prevention initiatives. RNA-Seq is a novel tool to comprehensively assess such associations. We interrogated RNA-Seq data from 3,775 malignant neoplasms in The Cancer Genome Atlas database for the presence of viral sequences. Viral integration sites were also detected in expressed transcripts using a novel approach. The detection capacity of RNA-Seq was compared to available clinical laboratory data. Human papillomavirus (HPV) transcripts were detected using RNA-Seq analysis in head-and-neck squamous cell carcinoma, uterine endometrioid carcinoma, and squamous cell carcinoma of the lung. Detection of HPV by RNA-Seq correlated with detection by in situ hybridization and immunohistochemistry in squamous cell carcinoma tumors of the head and neck. Hepatitis B virus and Epstein-Barr virus (EBV) were detected using RNA-Seq in hepatocellular carcinoma and gastric carcinoma tumors, respectively. Integration sites of viral genes and oncogenes were detected in cancers harboring HPV or hepatitis B virus but not in EBV-positive gastric carcinoma. Integration sites of expressed viral transcripts frequently involved known coding areas of the host genome. No DNA virus transcripts were detected in acute myeloid leukemia, cutaneous melanoma, low- and high-grade gliomas of the brain, and adenocarcinomas of the breast, colon and rectum, lung, prostate, ovary, kidney, and thyroid. In conclusion, this study provides a large-scale overview of the landscape of DNA viruses in human malignant cancers. While further validation is necessary for specific cancer types, our findings highlight the utility of RNA-Seq in detecting tumor-associated DNA viruses and identifying viral integration sites that may unravel novel mechanisms of cancer pathogenesis. PMID:23740984
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq
Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H.; Keleş, Sündüz; Dewey, Colin N.
2016-01-01
RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level. PMID:27405803
ChIP-seq and RNA-seq methods to study circadian control of transcription in mammals
Takahashi, Joseph S.; Kumar, Vivek; Nakashe, Prachi; Koike, Nobuya; Huang, Hung-Chung; Green, Carla B.; Kim, Tae-Kyung
2015-01-01
Genome-wide analyses have revolutionized our ability to study the transcriptional regulation of circadian rhythms. The advent of next-generation sequencing methods has facilitated the use of two such technologies, ChIP-seq and RNA-seq. In this chapter, we describe detailed methods and protocols for these two techniques, with emphasis on their usage in circadian rhythm experiments in the mouse liver, a major target organ of the circadian clock system. Critical factors for these methods are highlighted and issues arising with time series samples for ChIP-seq and RNA-seq are discussed. Finally detailed protocols for library preparation suitable for Illumina sequencing platforms are presented. PMID:25662462
GWIPS-viz: development of a ribo-seq genome browser
Michel, Audrey M.; Fox, Gearoid; M. Kiran, Anmol; De Bo, Christof; O’Connor, Patrick B. F.; Heaphy, Stephen M.; Mullan, James P. A.; Donohue, Claire A.; Higgins, Desmond G.; Baranov, Pavel V.
2014-01-01
We describe the development of GWIPS-viz (http://gwips.ucc.ie), an online genome browser for viewing ribosome profiling data. Ribosome profiling (ribo-seq) is a recently developed technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome-protected messenger RNA (mRNA) fragments, which allows the ribosome density along all mRNA transcripts present in the cell to be quantified. Since its inception, ribo-seq has been carried out in a number of eukaryotic and prokaryotic organisms. Owing to the increasing interest in ribo-seq, there is a pertinent demand for a dedicated ribo-seq genome browser. GWIPS-viz is based on The University of California Santa Cruz (UCSC) Genome Browser. Ribo-seq tracks, coupled with mRNA-seq tracks, are currently available for several genomes: human, mouse, zebrafish, nematode, yeast, bacteria (Escherichia coli K12, Bacillus subtilis), human cytomegalovirus and bacteriophage lambda. Our objective is to continue incorporating published ribo-seq data sets so that the wider community can readily view ribosome profiling information from multiple studies without the need to carry out computational processing. PMID:24185699
Determination of in vivo RNA kinetics using RATE-seq.
Neymotin, Benjamin; Athanasiadou, Rodoniki; Gresham, David
2014-10-01
The abundance of a transcript is determined by its rate of synthesis and its rate of degradation; however, global methods for quantifying RNA abundance cannot distinguish variation in these two processes. Here, we introduce RNA approach to equilibrium sequencing (RATE-seq), which uses in vivo metabolic labeling of RNA and approach to equilibrium kinetics, to determine absolute RNA degradation and synthesis rates. RATE-seq does not disturb cellular physiology, uses straightforward normalization with exogenous spike-ins, and can be readily adapted for studies in most organisms. We demonstrate the use of RATE-seq to estimate genome-wide kinetic parameters for coding and noncoding transcripts in Saccharomyces cerevisiae. © 2014 Neymotin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Lee, Bradford W.; Kumar, Virender B.; Biswas, Pooja; Ko, Audrey C.; Alameddine, Ramzi M.; Granet, David B.; Ayyagari, Radha; Kikkawa, Don O.; Korn, Bobby S.
2018-01-01
Objective: This study utilized Next Generation Sequencing (NGS) to identify differentially expressed transcripts in orbital adipose tissue from patients with active Thyroid Eye Disease (TED) versus healthy controls. Method: This prospective, case-control study enrolled three patients with severe, active thyroid eye disease undergoing orbital decompression, and three healthy controls undergoing routine eyelid surgery with removal of orbital fat. RNA Sequencing (RNA-Seq) was performed on freshly obtained orbital adipose tissue from study patients to analyze the transcriptome. Bioinformatics analysis was performed to determine pathways and processes enriched for the differential expression profile. Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR) was performed to validate the differential expression of selected genes identified by RNA-Seq. Results: RNA-Seq identified 328 differentially expressed genes associated with active thyroid eye disease, many of which were responsible for mediating inflammation, cytokine signaling, adipogenesis, IGF-1 signaling, and glycosaminoglycan binding. The IL-5 and chemokine signaling pathways were highly enriched, and very-low-density-lipoprotein receptor activity and statin medications were implicated as having a potential role in TED. Conclusion: This study is the first to use RNA-Seq technology to elucidate differential gene expression associated with active, severe TED. This study suggests a transcriptional basis for the role of statins in modulating differentially expressed genes that mediate the pathogenesis of thyroid eye disease. Furthermore, the identification of genes with altered levels of expression in active, severe TED may inform the molecular pathways central to this clinical phenotype and guide the development of novel therapeutic agents. PMID:29760827
Distributed biotin–streptavidin transcription roadblocks for mapping cotranscriptional RNA folding
Strobel, Eric J.; Nedialkov, Yuri; Artsimovitch, Irina
2017-01-01
Abstract RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin–streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin–SAv roadblocks. We then show that randomly distributed biotin–SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin–SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. PMID:28398514
Distributed biotin-streptavidin transcription roadblocks for mapping cotranscriptional RNA folding.
Strobel, Eric J; Watters, Kyle E; Nedialkov, Yuri; Artsimovitch, Irina; Lucks, Julius B
2017-07-07
RNA folding during transcription directs an order of folding that can determine RNA structure and function. However, the experimental study of cotranscriptional RNA folding has been limited by the lack of easily approachable methods that can interrogate nascent RNA structure at nucleotide resolution. To address this, we previously developed cotranscriptional selective 2΄-hydroxyl acylation analyzed by primer extension sequencing (SHAPE-Seq) to simultaneously probe all intermediate RNA transcripts during transcription by stalling elongation complexes at catalytically dead EcoRIE111Q roadblocks. While effective, the distribution of elongation complexes using EcoRIE111Q requires laborious PCR using many different oligonucleotides for each sequence analyzed. Here, we improve the broad applicability of cotranscriptional SHAPE-Seq by developing a sequence-independent biotin-streptavidin (SAv) roadblocking strategy that simplifies the preparation of roadblocking DNA templates. We first determine the properties of biotin-SAv roadblocks. We then show that randomly distributed biotin-SAv roadblocks can be used in cotranscriptional SHAPE-Seq experiments to identify the same RNA structural transitions related to a riboswitch decision-making process that we previously identified using EcoRIE111Q. Lastly, we find that EcoRIE111Q maps nascent RNA structure to specific transcript lengths more precisely than biotin-SAv and propose guidelines to leverage the complementary strengths of each transcription roadblock in cotranscriptional SHAPE-Seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Defining the Status of RNA Polymerase at Promoters
Core, Leighton J.; Waterfall, Joshua J.; Gilchrist, Daniel A.; Fargo, David C.; Kwak, Hojoong; Adelman, Karen; Lis, John T.
2012-01-01
Summary Recent genome-wide studies in metazoans have shown that RNA Polymerase II (Pol II) accumulates to high densities on many promoters at a rate-limited step in transcription. However, the status of this Pol II remains an area of debate. Here, we compare quantitative outputs of GRO-seq and ChIP-seq assays and demonstrate the majority of the Pol II on Drosophila promoters is transcriptionally-engaged - very little exists in a preinitiation or arrested complex. These promoter-proximal polymerases are inhibited from further elongation by detergent sensitive factors, and knockdown of negative elongation factor, NELF, reduces their levels. These results not only solidify that pausing occurs at most promoters, but demonstrate that it is the major rate-limiting step in early transcription at these promoters. Finally, the divergent elongation complexes seen at mammalian promoters are far less prevalent in Drosophila, and this specificity in orientation correlates with directional core promoter elements, which are abundant in Drosophila. PMID:23062713
Wolf, Timo; Schneiker-Bekel, Susanne; Neshat, Armin; Ortseifen, Vera; Wibberg, Daniel; Zemke, Till; Pühler, Alfred; Kalinowski, Jörn
2017-06-10
Actinoplanes sp. SE50/110 is the natural producer of acarbose, which is used in the treatment of diabetes mellitus type II. However, until now the transcriptional organization and regulation of the acarbose biosynthesis are only understood rudimentarily. The genome sequence of Actinoplanes sp. SE50/110 was known before, but was resequenced in this study to remove assembly artifacts and incorrect base callings. The annotation of the genome was refined in a multi-step approach, including modern bioinformatic pipelines, transcriptome and proteome data. A whole transcriptome RNA-seq library as well as an RNA-seq library enriched for primary 5'-ends were used for the detection of transcription start sites, to correct tRNA predictions, to identify novel transcripts like small RNAs and to improve the annotation through the correction of falsely annotated translation start sites. The transcriptome data sets were also applied to identify 31 cis-regulatory RNA structures, such as riboswitches or RNA thermometers as well as three leaderless transcribed short peptides found in putative attenuators upstream of genes for amino acid biosynthesis. The transcriptional organization of the acarbose biosynthetic gene cluster was elucidated in detail and fourteen novel biosynthetic gene clusters were suggested. The accurate genome sequence and precise annotation of the Actinoplanes sp. SE50/110 genome will be the foundation for future genetic engineering and systems biology studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Integrative Analysis of Many RNA-Seq Datasets to Study Alternative Splicing
Li, Wenyuan; Dai, Chao; Kang, Shuli; Zhou, Xianghong Jasmine
2014-01-01
Alternative splicing is an important gene regulatory mechanism that dramatically increases the complexity of the proteome. However, how alternative splicing is regulated and how transcription and splicing are coordinated are still poorly understood, and functions of transcript isoforms have been studied only in a few limited cases. Nowadays, RNA-seq technology provides an exceptional opportunity to study alternative splicing on genome-wide scales and in an unbiased manner. With the rapid accumulation of data in public repositories, new challenges arise from the urgent need to effectively integrate many different RNA-seq datasets for study alterative splicing. This paper discusses a set of advanced computational methods that can integrate and analyze many RNA-seq datasets to systematically identify splicing modules, unravel the coupling of transcription and splicing, and predict the functions of splicing isoforms on a genome-wide scale. PMID:24583115
Spliced synthetic genes as internal controls in RNA sequencing experiments.
Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R
2016-09-01
RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Reid-Bayliss, Kate S; Loeb, Lawrence A
2017-08-29
Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
Lawley, Blair; Sims, Ian M.
2013-01-01
Lactobacillus ruminis is an inhabitant of human bowels and bovine rumens. None of 10 isolates (three from bovine rumen, seven from human feces) of L. ruminis that were tested could utilize barley β-glucan for growth. Seven of the strains of L. ruminis were, however, able to utilize tetrasaccharides (3-O-β-cellotriosyl-d-glucose [LDP4] or 4-O-β-laminaribiosyl-d-cellobiose [CDP4]) present in β-glucan hydrolysates for growth. The tetrasaccharides were generated by the use of lichenase or cellulase, respectively. To learn more about the utilization of tetrasaccharides by L. ruminis, whole-transcriptome shotgun sequencing (RNA-seq) was tested as a transcriptional screen to detect altered gene expression when an autochthonous human strain (L5) was grown in medium containing CDP4. RNA-seq results were confirmed and extended by reverse transcription-quantitative PCR assays of selected genes in two upregulated operons when cells were grown as batch cultures in medium containing either CDP4 or LDP4. The cellobiose utilization operon had increased transcription, particularly in early growth phase, whereas the chemotaxis/motility operon was upregulated in late growth phase. Phenotypic changes were seen in relation to upregulation of chemotaxis/flagellar operons: flagella were rarely seen by electron microscopy on glucose-grown cells but cells cultured in tetrasaccharide medium were commonly flagellated. Chemotactic movement toward tetrasaccharides was demonstrated in capillary cultures. L. ruminis utilized 3-O-β-cellotriosyl-d-glucose released by β-glucan hydrolysis due to bowel commensal Coprococcus sp., indicating that cross feeding of tetrasaccharide between bacteria could occur. Therefore, the RNA-seq screen and subsequent experiments had utility in revealing foraging attributes of gut commensal Lactobacillus ruminis. PMID:23851085
Łabaj, Paweł P; Leparc, Germán G; Linggi, Bryan E; Markillie, Lye Meng; Wiley, H Steven; Kreil, David P
2011-07-01
Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at
Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph
2016-01-01
Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.
Xie, Rangjin; Zhang, Jin; Ma, Yanyan; Pan, Xiaoting; Dong, Cuicui; Pang, Shaoping; He, Shaolan; Deng, Lie; Yi, Shilai; Zheng, Yongqiang; Lv, Qiang
2017-02-06
Citrus is one of the most economically important fruit crops around world. Drought and salinity stresses adversely affected its productivity and fruit quality. However, the genetic regulatory networks and signaling pathways involved in drought and salinity remain to be elucidated. With RNA-seq and sRNA-seq, an integrative analysis of miRNA and mRNA expression profiling and their regulatory networks were conducted using citrus roots subjected to dehydration and salt treatment. Differentially expressed (DE) mRNA and miRNA profiles were obtained according to fold change analysis and the relationships between miRNAs and target mRNAs were found to be coherent and incoherent in the regulatory networks. GO enrichment analysis revealed that some crucial biological processes related to signal transduction (e.g. 'MAPK cascade'), hormone-mediated signaling pathways (e.g. abscisic acid- activated signaling pathway'), reactive oxygen species (ROS) metabolic process (e.g. 'hydrogen peroxide catabolic process') and transcription factors (e.g., 'MYB, ZFP and bZIP') were involved in dehydration and/or salt treatment. The molecular players in response to dehydration and salt treatment were partially overlapping. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-seq and sRNA-seq analysis. This study provides new insights into the molecular mechanisms how citrus roots respond to dehydration and salt treatment.
Xie, Rangjin; Zhang, Jin; Ma, Yanyan; Pan, Xiaoting; Dong, Cuicui; Pang, Shaoping; He, Shaolan; Deng, Lie; Yi, Shilai; Zheng, Yongqiang; Lv, Qiang
2017-01-01
Citrus is one of the most economically important fruit crops around world. Drought and salinity stresses adversely affected its productivity and fruit quality. However, the genetic regulatory networks and signaling pathways involved in drought and salinity remain to be elucidated. With RNA-seq and sRNA-seq, an integrative analysis of miRNA and mRNA expression profiling and their regulatory networks were conducted using citrus roots subjected to dehydration and salt treatment. Differentially expressed (DE) mRNA and miRNA profiles were obtained according to fold change analysis and the relationships between miRNAs and target mRNAs were found to be coherent and incoherent in the regulatory networks. GO enrichment analysis revealed that some crucial biological processes related to signal transduction (e.g. ‘MAPK cascade’), hormone-mediated signaling pathways (e.g. abscisic acid- activated signaling pathway’), reactive oxygen species (ROS) metabolic process (e.g. ‘hydrogen peroxide catabolic process’) and transcription factors (e.g., ‘MYB, ZFP and bZIP’) were involved in dehydration and/or salt treatment. The molecular players in response to dehydration and salt treatment were partially overlapping. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-seq and sRNA-seq analysis. This study provides new insights into the molecular mechanisms how citrus roots respond to dehydration and salt treatment. PMID:28165059
Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.
2016-01-01
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030
USDA-ARS?s Scientific Manuscript database
Endogenous mRNA-antisense transcripts are involved in regulation of a wide range of biological processes including muscle development and quality traits of farm animals. Standard RNA-Seq can be used to identify sense-antisense transcripts. However, strand-specific RNA-Seq is required to resolve ambi...
High-confidence coding and noncoding transcriptome maps
2017-01-01
The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519
TSSAR: TSS annotation regime for dRNA-seq data.
Amman, Fabian; Wolfinger, Michael T; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F; Findeiß, Sven
2014-03-27
Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.
Transcriptional profiling of murine osteoblast differentiation based on RNA-seq expression analyses.
Khayal, Layal Abo; Grünhagen, Johannes; Provazník, Ivo; Mundlos, Stefan; Kornak, Uwe; Robinson, Peter N; Ott, Claus-Eric
2018-04-11
Osteoblastic differentiation is a multistep process characterized by osteogenic induction of mesenchymal stem cells, which then differentiate into proliferative pre-osteoblasts that produce copious amounts of extracellular matrix, followed by stiffening of the extracellular matrix, and matrix mineralization by hydroxylapatite deposition. Although these processes have been well characterized biologically, a detailed transcriptional analysis of murine primary calvaria osteoblast differentiation based on RNA sequencing (RNA-seq) analyses has not previously been reported. Here, we used RNA-seq to obtain expression values of 29,148 genes at four time points as murine primary calvaria osteoblasts differentiate in vitro until onset of mineralization was clearly detectable by microscopic inspection. Expression of marker genes confirmed osteogenic differentiation. We explored differential expression of 1386 protein-coding genes using unsupervised clustering and GO analyses. 100 differentially expressed lncRNAs were investigated by co-expression with protein-coding genes that are localized within the same topologically associated domain. Additionally, we monitored expression of 237 genes that are silent or active at distinct time points and compared differential exon usage. Our data represent an in-depth profiling of murine primary calvaria osteoblast differentiation by RNA-seq and contribute to our understanding of genetic regulation of this key process in osteoblast biology. Copyright © 2018 Elsevier Inc. All rights reserved.
Blood-induced differential gene expression in Anopheles dirus evaluated using RNA sequencing.
Mongkol, W; Nguitragool, W; Sattabongkot, J; Kubera, A
2018-06-08
Malaria parasites are transmitted through blood feeding by female Anopheline mosquitoes. Unveiling the blood-feeding process will improve understanding of vector biology. Anopheles dirus (Diptera: Culicidae) is one of the primary malaria vectors in the Greater Mekong Subregion, the epicentre of malaria drug resistance. In this study, differential gene expression between sugar- and blood-fed An. dirus was investigated by RNA sequencing (RNA-seq). A total of 589 transcripts were found to be upregulated and 703 transcripts downregulated as a result of blood feeding. Transcriptional differences were found in genes involved in blood digestion, peritrophic matrix formation, oogenesis and vitellogenesis. The expression levels of several genes were validated by quantitative reverse transcription polymerase chain reaction. The present results provide better understanding of An. dirus biology in relation to its blood feeding. © 2018 The Royal Entomological Society.
Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.
Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H; Keleş, Sündüz; Dewey, Colin N
2016-08-01
RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level. © 2016 Liu et al.; Published by Cold Spring Harbor Laboratory Press.
RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.
Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei
2014-07-07
Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.
Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq
Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim
2014-01-01
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
RNA-Seq Profiling Reveals Novel Hepatic Gene Expression Pattern in Aflatoxin B1 Treated Rats
Merrick, B. Alex; Phadke, Dhiral P.; Auerbach, Scott S.; Mav, Deepak; Stiegelmeyer, Suzy M.; Shah, Ruchir R.; Tice, Raymond R.
2013-01-01
Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1’s carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT’s) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq’s capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma. PMID:23630614
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K
2017-04-01
Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.
Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior
2012-01-01
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A. C.; Ning, Zemin; Slagboom, P. Eline; Ye, Kai
2012-01-01
Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ∼ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22219203
PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.
Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng
2018-05-01
The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.
Möller, Philip; Overlöper, Aaron; Förstner, Konrad U.; Wen, Tuan-Nan; Sharma, Cynthia M.; Lai, Erh-Min; Narberhaus, Franz
2014-01-01
As matchmaker between mRNA and sRNA interactions, the RNA chaperone Hfq plays a key role in riboregulation of many bacteria. Often, the global influence of Hfq on the transcriptome is reflected by substantially altered proteomes and pleiotropic phenotypes in hfq mutants. Using quantitative proteomics and co-immunoprecipitation combined with RNA-sequencing (RIP-seq) of Hfq-bound RNAs, we demonstrate the pervasive role of Hfq in nutrient acquisition, metabolism and motility of the plant pathogen Agrobacterium tumefaciens. 136 of 2544 proteins identified by iTRAQ (isobaric tags for relative and absolute quantitation) were affected in the absence of Hfq. Most of them were associated with ABC transporters, general metabolism and motility. RIP-seq of chromosomally encoded Hfq3xFlag revealed 1697 mRNAs and 209 non-coding RNAs (ncRNAs) associated with Hfq. 56 ncRNAs were previously undescribed. Interestingly, 55% of the Hfq-bound ncRNAs were encoded antisense (as) to a protein-coding sequence suggesting that A. tumefaciens Hfq plays an important role in asRNA-target interactions. The exclusive enrichment of 296 mRNAs and 31 ncRNAs under virulence conditions further indicates a role for post-transcriptional regulation in A. tumefaciens-mediated plant infection. On the basis of the iTRAQ and RIP-seq data, we assembled a comprehensive model of the Hfq core regulon in A. tumefaciens. PMID:25330313
Möller, Philip; Overlöper, Aaron; Förstner, Konrad U; Wen, Tuan-Nan; Sharma, Cynthia M; Lai, Erh-Min; Narberhaus, Franz
2014-01-01
As matchmaker between mRNA and sRNA interactions, the RNA chaperone Hfq plays a key role in riboregulation of many bacteria. Often, the global influence of Hfq on the transcriptome is reflected by substantially altered proteomes and pleiotropic phenotypes in hfq mutants. Using quantitative proteomics and co-immunoprecipitation combined with RNA-sequencing (RIP-seq) of Hfq-bound RNAs, we demonstrate the pervasive role of Hfq in nutrient acquisition, metabolism and motility of the plant pathogen Agrobacterium tumefaciens. 136 of 2544 proteins identified by iTRAQ (isobaric tags for relative and absolute quantitation) were affected in the absence of Hfq. Most of them were associated with ABC transporters, general metabolism and motility. RIP-seq of chromosomally encoded Hfq3xFlag revealed 1697 mRNAs and 209 non-coding RNAs (ncRNAs) associated with Hfq. 56 ncRNAs were previously undescribed. Interestingly, 55% of the Hfq-bound ncRNAs were encoded antisense (as) to a protein-coding sequence suggesting that A. tumefaciens Hfq plays an important role in asRNA-target interactions. The exclusive enrichment of 296 mRNAs and 31 ncRNAs under virulence conditions further indicates a role for post-transcriptional regulation in A. tumefaciens-mediated plant infection. On the basis of the iTRAQ and RIP-seq data, we assembled a comprehensive model of the Hfq core regulon in A. tumefaciens.
Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida
2016-03-15
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Zhu, Chunhui; Li, Xuefeng; Zheng, Jingyuan
2018-05-03
Hot pepper (Capsicum annuum L.), which is a member of the Solanaceae family, is becoming an increasingly important vegetable crop worldwide. Cucumber mosaic virus (CMV) is a destructive virus that can cause leaf distortion and fruit lesions, affecting pepper production. However, studies on the responses to CMV infection in pepper at the transcriptional level are limited. In this study, the transcript profiles of pepper leaves after CMV infection were investigated using Illumina and single-molecule real-time (SMRT) RNA-sequencing (RNA-seq). A total of 2143 differentially expressed genes (DEGs) were identified at five different stages. Gene ontology (GO) and KEGG analysis revealed that these DEGs were involved in the response to stress, defense response and plant-pathogen interaction pathways. Among these DEGs, several key genes that consistently appeared in studies of plant-pathogen interactions had increased transcript abundance after inoculation, including chitinase, pathogenesis-related (PR) protein, TMV resistance protein, WRKY transcription factor and jasmonate ZIM-domain protein. Nine of these DEGs were further validated by quantitative real-time-PCR (qRT-PCR). Furthermore, a total of 73, 597 alternate splicing (AS) events were identified in the pepper leaves after CMV infection, distributed in 12, 615 genes. The intron retention of WRKY33 (Capana09g001251) might be involved in the regulation of CMV infection. Taken together, our study provides a transcriptome-wide insight into the molecular basis of resistance to CMV infection in pepper leaves and potential candidate genes for improving resistance cultivars. Copyright © 2017. Published by Elsevier B.V.
2014-01-01
Background Mosquito control programmes using chemical insecticides are increasingly threatened by the development of resistance. Such resistance can be the consequence of changes in proteins targeted by insecticides (target site mediated resistance), increased insecticide biodegradation (metabolic resistance), altered transport, sequestration or other mechanisms. As opposed to target site resistance, other mechanisms are far from being fully understood. Indeed, insecticide selection often affects a large number of genes and various biological processes can hypothetically confer resistance. In this context, the aim of the present study was to use RNA sequencing (RNA-seq) for comparing transcription level and polymorphism variations associated with adaptation to chemical insecticides in the mosquito Aedes aegypti. Biological materials consisted of a parental susceptible strain together with three child strains selected across multiple generations with three insecticides from different classes: the pyrethroid permethrin, the neonicotinoid imidacloprid and the carbamate propoxur. Results After ten generations, insecticide-selected strains showed elevated resistance levels to the insecticides used for selection. RNA-seq data allowed detecting over 13,000 transcripts, of which 413 were differentially transcribed in insecticide-selected strains as compared to the susceptible strain. Among them, a significant enrichment of transcripts encoding cuticle proteins, transporters and enzymes was observed. Polymorphism analysis revealed over 2500 SNPs showing > 50% allele frequency variations in insecticide-selected strains as compared to the susceptible strain, affecting over 1000 transcripts. Comparing gene transcription and polymorphism patterns revealed marked differences among strains. While imidacloprid selection was linked to the over transcription of many genes, permethrin selection was rather linked to polymorphism variations. Focusing on detoxification enzymes revealed that permethrin selection strongly affected the polymorphism of several transcripts encoding cytochrome P450 monooxygenases likely involved in insecticide biodegradation. Conclusions The present study confirmed the power of RNA-seq for identifying concomitantly quantitative and qualitative transcriptome changes associated with insecticide resistance in mosquitoes. Our results suggest that transcriptome modifications can be selected rapidly by insecticides and affect multiple biological functions. Previously neglected by molecular screenings, polymorphism variations of detoxification enzymes may play an important role in the adaptive response of mosquitoes to insecticides. PMID:24593293
David, Jean-Philippe; Faucon, Frédéric; Chandor-Proust, Alexia; Poupardin, Rodolphe; Riaz, Muhammad Asam; Bonin, Aurélie; Navratil, Vincent; Reynaud, Stéphane
2014-03-05
Mosquito control programmes using chemical insecticides are increasingly threatened by the development of resistance. Such resistance can be the consequence of changes in proteins targeted by insecticides (target site mediated resistance), increased insecticide biodegradation (metabolic resistance), altered transport, sequestration or other mechanisms. As opposed to target site resistance, other mechanisms are far from being fully understood. Indeed, insecticide selection often affects a large number of genes and various biological processes can hypothetically confer resistance. In this context, the aim of the present study was to use RNA sequencing (RNA-seq) for comparing transcription level and polymorphism variations associated with adaptation to chemical insecticides in the mosquito Aedes aegypti. Biological materials consisted of a parental susceptible strain together with three child strains selected across multiple generations with three insecticides from different classes: the pyrethroid permethrin, the neonicotinoid imidacloprid and the carbamate propoxur. After ten generations, insecticide-selected strains showed elevated resistance levels to the insecticides used for selection. RNA-seq data allowed detecting over 13,000 transcripts, of which 413 were differentially transcribed in insecticide-selected strains as compared to the susceptible strain. Among them, a significant enrichment of transcripts encoding cuticle proteins, transporters and enzymes was observed. Polymorphism analysis revealed over 2500 SNPs showing > 50% allele frequency variations in insecticide-selected strains as compared to the susceptible strain, affecting over 1000 transcripts. Comparing gene transcription and polymorphism patterns revealed marked differences among strains. While imidacloprid selection was linked to the over transcription of many genes, permethrin selection was rather linked to polymorphism variations. Focusing on detoxification enzymes revealed that permethrin selection strongly affected the polymorphism of several transcripts encoding cytochrome P450 monooxygenases likely involved in insecticide biodegradation. The present study confirmed the power of RNA-seq for identifying concomitantly quantitative and qualitative transcriptome changes associated with insecticide resistance in mosquitoes. Our results suggest that transcriptome modifications can be selected rapidly by insecticides and affect multiple biological functions. Previously neglected by molecular screenings, polymorphism variations of detoxification enzymes may play an important role in the adaptive response of mosquitoes to insecticides.
Venkata Narayanan, Ishwarya; Paulsen, Michelle T.; Bedi, Karan; Berg, Nathan; Ljungman, Emily A.; Francia, Sofia; Veloso, Artur; Magnuson, Brian; di Fagagna, Fabrizio d’Adda; Wilson, Thomas E.; Ljungman, Mats
2017-01-01
In response to ionizing radiation (IR), cells activate a DNA damage response (DDR) pathway to re-program gene expression. Previous studies using total cellular RNA analyses have shown that the stress kinase ATM and the transcription factor p53 are integral components required for induction of IR-induced gene expression. These studies did not distinguish between changes in RNA synthesis and RNA turnover and did not address the role of enhancer elements in DDR-mediated transcriptional regulation. To determine the contribution of synthesis and degradation of RNA and monitor the activity of enhancer elements following exposure to IR, we used the recently developed Bru-seq, BruChase-seq and BruUV-seq techniques. Our results show that ATM and p53 regulate both RNA synthesis and stability as well as enhancer element activity following exposure to IR. Importantly, many genes in the p53-signaling pathway were coordinately up-regulated by both increased synthesis and RNA stability while down-regulated genes were suppressed either by reduced synthesis or stability. Our study is the first of its kind that independently assessed the effects of ionizing radiation on transcription and post-transcriptional regulation in normal human cells. PMID:28256581
Dysregulated microRNA Activity in Shwachman-Diamond Syndrome
2016-09-01
define transcriptional signatures of bone marrow failure in SDS using single cell RNA -seq of patient cells. We will analyze these datasets to test the...microRNA expression profiles from HSPCs to be overlaid onto mRNA profiles. 15. SUBJECT TERMS Single cell RNA -seq; bone marrow failure; hematopoiesis...myelopoiesis; targeted RNA -seq 16. SECURITY CLASSIFICATION OF: U 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON
Yan, Yong-Wei; Zou, Bin; Zhu, Ting; Hozzein, Wael N.
2017-01-01
RNA-seq-based SSU (small subunit) rRNA (ribosomal RNA) analysis has provided a better understanding of potentially active microbial community within environments. However, for RNA-seq library construction, high quantities of purified RNA are typically required. We propose a modified RNA-seq method for SSU rRNA-based microbial community analysis that depends on the direct ligation of a 5’ adaptor to RNA before reverse-transcription. The method requires only a low-input quantity of RNA (10–100 ng) and does not require a DNA removal step. The method was initially tested on three mock communities synthesized with enriched SSU rRNA of archaeal, bacterial and fungal isolates at different ratios, and was subsequently used for environmental samples of high or low biomass. For high-biomass salt-marsh sediments, enriched SSU rRNA and total nucleic acid-derived RNA-seq datasets revealed highly consistent community compositions for all of the SSU rRNA sequences, and as much as 46.4%-59.5% of 16S rRNA sequences were suitable for OTU (operational taxonomic unit)-based community and diversity analyses with complete coverage of V1-V2 regions. OTU-based community structures for the two datasets were also highly consistent with those determined by all of the 16S rRNA reads. For low-biomass samples, total nucleic acid-derived RNA-seq datasets were analyzed, and highly active bacterial taxa were also identified by the OTU-based method, notably including members of the previously underestimated genus Nitrospira and phylum Acidobacteria in tap water, members of the phylum Actinobacteria on a shower curtain, and members of the phylum Cyanobacteria on leaf surfaces. More than half of the bacterial 16S rRNA sequences covered the complete region of primer 8F, and non-coverage rates as high as 38.7% were obtained for phylum-unclassified sequences, providing many opportunities to identify novel bacterial taxa. This modified RNA-seq method will provide a better snapshot of diverse microbial communities, most notably by OTU-based analysis, even communities with low-biomass samples. PMID:29016661
Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.
Grimes, Tyler; Walker, Alejandro R; Datta, Susmita; Datta, Somnath
2018-05-30
Neuroblastoma is the most common tumor of early childhood and is notorious for its high variability in clinical presentation. Accurate prognosis has remained a challenge for many patients. In this study, expression profiles from RNA-sequencing are used to predict survival times directly. Several models are investigated using various annotation levels of expression profiles (genes, transcripts, and introns), and an ensemble predictor is proposed as a heuristic for combining these different profiles. The use of RNA-seq data is shown to improve accuracy in comparison to using clinical data alone for predicting overall survival times. Furthermore, clinically high-risk patients can be subclassified based on their predicted overall survival times. In this effort, the best performing model was the elastic net using both transcripts and introns together. This model separated patients into two groups with 2-year overall survival rates of 0.40±0.11 (n=22) versus 0.80±0.05 (n=68). The ensemble approach gave similar results, with groups 0.42±0.10 (n=25) versus 0.82±0.05 (n=65). This suggests that the ensemble is able to effectively combine the individual RNA-seq datasets. Using predicted survival times based on RNA-seq data can provide improved prognosis by subclassifying clinically high-risk neuroblastoma patients. This article was reviewed by Subharup Guha and Isabel Nepomuceno.
Chakraborty, Sandeep; Britton, Monica; Martínez-García, P J; Dandekar, Abhaya M
2016-03-01
Deep RNA-Seq profiling, a revolutionary method used for quantifying transcriptional levels, often includes non-specific transcripts from other co-existing organisms in spite of stringent protocols. Using the recently published walnut genome sequence as a filter, we present a broad analysis of the RNA-Seq derived transcriptome profiles obtained from twenty different tissues to extract the biodiversity and possible plant-microbe interactions in the walnut ecosystem in California. Since the residual nature of the transcripts being analyzed does not provide sufficient information to identify the exact strain, inferences made are constrained to the genus level. The presence of the pathogenic oomycete Phytophthora was detected in the root through the presence of a glyceraldehyde-3-phosphate dehydrogenase. Cryptococcus, the causal agent of cryptococcosis, was found in the catkins and vegetative buds, corroborating previous work indicating that the plant surface supported the sexual cycle of this human pathogen. The RNA-Seq profile revealed several species of the endophytic nitrogen fixing Actinobacteria. Another bacterial species implicated in aerobic biodegradation of methyl tert-butyl ether (Methylibium petroleiphilum) is also found in the root. RNA encoding proteins from the pea aphid were found in the leaves and vegetative buds, while a serine protease from mosquito with significant homology to a female reproductive tract protease from Drosophila mojavensis in the vegetative bud suggests egg-laying activities. The comprehensive analysis of RNA-seq data present also unraveled detailed, tissue-specific information of ~400 transcripts encoded by the largest family of resistance (R) genes (NBS-LRR), which possibly rationalizes the resistance of the specific walnut plant to the pathogens detected. Thus, we elucidate the biodiversity and possible plant-microbe interactions in several walnut (Juglans regia) tissues in California using deep RNA-Seq profiling.
rnaQUAST: a quality assessment tool for de novo transcriptome assemblies.
Bushmanova, Elena; Antipov, Dmitry; Lapidus, Alla; Suvorov, Vladimir; Prjibelski, Andrey D
2016-07-15
Ability to generate large RNA-Seq datasets created a demand for both de novo and reference-based transcriptome assemblers. However, while many transcriptome assemblers are now available, there is still no unified quality assessment tool for RNA-Seq assemblies. We present rnaQUAST-a tool for evaluating RNA-Seq assembly quality and benchmarking transcriptome assemblers using reference genome and gene database. rnaQUAST calculates various metrics that demonstrate completeness and correctness levels of the assembled transcripts, and outputs them in a user-friendly report. rnaQUAST is implemented in Python and is freely available at http://bioinf.spbau.ru/en/rnaquast ap@bioinf.spbau.ru Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Eicher, John D.; Wakabayashi, Yoshiyuki; Vitseva, Olga; Esa, Nada; Yang, Yanqin; Zhu, Jun; Freedman, Jane E.; McManus, David D.; Johnson, Andrew D.
2016-01-01
Transcripts in platelets are largely produced in precursor megakaryocytes but remain physiologically-active as platelets translate RNAs and regulate protein/RNA levels. Recent studies using transcriptome sequencing (RNA-seq) characterized the platelet transcriptome in limited numbers of non-diseased individuals. Here, we expand upon these RNA-seq studies by completing RNA-seq in platelets from 32 patients with acute myocardial infarction (MI). Our goals were to characterize the platelet transcriptome using a population of patients with acute MI and relate gene expression to platelet aggregation measures and ST-segment elevation MI (STEMI) (n=16) versus non-STEMI (NSTEMI) (n=16) subtypes. Similar to other studies, we detected 9,565 expressed transcripts, including several known platelet-enriched markers (e.g., PPBP, OST4). Our RNA-seq data strongly correlated with independently ascertained platelet expression data and showed enrichment for platelet-related pathways (e.g., wound response, hemostasis, and platelet activation), as well as actin-related and post-transcriptional processes. Several transcripts displayed suggestively higher (FBXL4, ECHDC3, KCNE1, TAOK2, AURKB, ERG, and FKBP5) and lower (MIAT, PVRL3and PZP) expression in STEMI platelets compared to NSTEMI. We also identified transcripts correlated with platelet aggregation to TRAP (ATP6V1G2, SLC2A3), collagen (CEACAM1, ITGA2), and ADP (PDGFB, PDGFC, ST3GAL6). Our study adds to current platelet gene expression resources by providing transcriptome-wide analyses in platelets isolated from patients with acute MI. In concert with prior studies, we identify various genes for further study in regards to platelet function and acute MI. Future platelet RNA-seq studies examining more diverse sets of healthy and diseased samples will add to our understanding of platelet thrombotic and non-thrombotic functions. PMID:26367242
Potts, Anastasia H; Leng, Yuanyuan; Babitzke, Paul; Romeo, Tony
2018-03-29
The Csr global regulatory system coordinates gene expression in response to metabolic status. This system utilizes the RNA binding protein CsrA to regulate gene expression by binding to transcripts of structural and regulatory genes, thus affecting their structure, stability, translation, and/or transcription elongation. CsrA activity is controlled by sRNAs, CsrB and CsrC, which sequester CsrA away from other transcripts. CsrB/C levels are partly determined by their rates of turnover, which requires CsrD to render them susceptible to RNase E cleavage. Previous epistasis analysis suggested that CsrD affects gene expression through the other Csr components, CsrB/C and CsrA. However, those conclusions were based on a limited analysis of reporters. Here, we reassessed the global behavior of the Csr circuitry using epistasis analysis with RNA seq (Epi-seq). Because CsrD effects on mRNA levels were entirely lost in the csrA mutant and largely eliminated in a csrB/C mutant under our experimental conditions, while the majority of CsrA effects persisted in the absence of csrD, the original model accounts for the global behavior of the Csr system. Our present results also reflect a more nuanced role of CsrA as terminal regulator of the Csr system than has been recognized.
Sokol, Martin; Jessen, Karen Margrethe; Pedersen, Finn Skou
2016-01-01
Several studies have shown that human endogenous retroviruses and endogenous retrovirus-like repeats (here collectively HERVs) impose direct regulation on human genes through enhancer and promoter motifs present in their long terminal repeats (LTRs). Although chimeric transcription in which novel gene isoforms containing retroviral and human sequence are transcribed from viral promoters are commonly associated with disease, regulation by HERVs is beneficial in other settings; for example, in human testis chimeric isoforms of TP63 induced by an ERV9 LTR protect the male germ line upon DNA damage by inducing apoptosis, whereas in the human globin locus the γ- and β-globin switch during normal hematopoiesis is mediated by complex interactions of an ERV9 LTR and surrounding human sequence. The advent of deep sequencing or next-generation sequencing (NGS) has revolutionized the way researchers solve important scientific questions and develop novel hypotheses in relation to human genome regulation. We recently applied next-generation paired-end RNA-sequencing (RNA-seq) together with chromatin immunoprecipitation with sequencing (ChIP-seq) to examine ERV9 chimeric transcription in human reference cell lines from Encyclopedia of DNA Elements (ENCODE). This led to the discovery of advanced regulation mechanisms by ERV9s and other HERVs across numerous human loci including transcription of large gene-unannotated genomic regions, as well as cooperative regulation by multiple HERVs and non-LTR repeats such as Alu elements. In this article, well-established examples of human gene regulation by HERVs are reviewed followed by a description of paired-end RNA-seq, and its application in identifying chimeric transcription genome-widely. Based on integrative analyses of RNA-seq and ChIP-seq, data we then present novel examples of regulation by ERV9s of tumor suppressor genes CADM2 and SEMA3A, as well as transcription of an unannotated region. Taken together, this article highlights the high suitability of contemporary sequencing methods in future analyses of human biology in relation to evolutionary acquired retroviruses in the human genome. © 2016 APMIS. Published by John Wiley & Sons Ltd.
Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.
Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D
2015-07-30
RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.
Indel detection from DNA and RNA sequencing data with transIndel.
Yang, Rendong; Van Etten, Jamie L; Dehm, Scott M
2018-04-19
Insertions and deletions (indels) are a major class of genomic variation associated with human disease. Indels are primarily detected from DNA sequencing (DNA-seq) data but their transcriptional consequences remain unexplored due to challenges in discriminating medium-sized and large indels from splicing events in RNA-seq data. Here, we developed transIndel, a splice-aware algorithm that parses the chimeric alignments predicted by a short read aligner and reconstructs the mid-sized insertions and large deletions based on the linear alignments of split reads from DNA-seq or RNA-seq data. TransIndel exhibits competitive or superior performance over eight state-of-the-art indel detection tools on benchmarks using both synthetic and real DNA-seq data. Additionally, we applied transIndel to DNA-seq and RNA-seq datasets from 333 primary prostate cancer patients from The Cancer Genome Atlas (TCGA) and 59 metastatic prostate cancer patients from AACR-PCF Stand-Up- To-Cancer (SU2C) studies. TransIndel enhanced the taxonomy of DNA- and RNA-level alterations in prostate cancer by identifying recurrent FOXA1 indels as well as exitron splicing in genes implicated in disease progression. Our study demonstrates that transIndel is a robust tool for elucidation of medium- and large-sized indels from DNA-seq and RNA-seq data. Including RNA-seq in indel discovery efforts leads to significant improvements in sensitivity for identification of med-sized and large indels missed by DNA-seq, and reveals non-canonical RNA-splicing events in genes associated with disease pathology.
DBATE: database of alternative transcripts expression.
Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2013-01-01
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
Rizvi, Abbas H.; Camara, Pablo G.; Kandror, Elena K.; Roberts, Thomas J.; Schieren, Ira; Maniatis, Tom; Rabadan, Raul
2017-01-01
Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding cell fate has been advanced by studying single-cell RNA-seq, but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Compared to other methods, scTDA is a non-linear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time, and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins and long non-coding RNAs. scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations. PMID:28459448
Time Series Expression Analyses Using RNA-seq: A Statistical Approach
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Lattimore, Vanessa L.; Pearson, John F.; Currie, Margaret J.; Spurdle, Amanda B.; Robinson, Bridget A.; Walker, Logan C.
2018-01-01
PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2. The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates (n > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance. PMID:29774201
Lattimore, Vanessa L; Pearson, John F; Currie, Margaret J; Spurdle, Amanda B; Robinson, Bridget A; Walker, Logan C
2018-01-01
PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2 . The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates ( n > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance.
Experimental Design and Power Calculation for RNA-seq Experiments.
Wu, Zhijin; Wu, Hao
2016-01-01
Power calculation is a critical component of RNA-seq experimental design. The flexibility of RNA-seq experiment and the wide dynamic range of transcription it measures make it an attractive technology for whole transcriptome analysis. These features, in addition to the high dimensionality of RNA-seq data, bring complexity in experimental design, making an analytical power calculation no longer realistic. In this chapter we review the major factors that influence the statistical power of detecting differential expression, and give examples of power assessment using the R package PROPER.
Comparative Analysis of Single-Cell RNA Sequencing Methods.
Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang
2017-02-16
Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.
2018-01-01
ABSTRACT Primary infection with human cytomegalovirus (HCMV) results in a lifelong infection due to its ability to establish latent infection, with one characterized viral reservoir being hematopoietic cells. Although reactivation from latency causes serious disease in immunocompromised individuals, our molecular understanding of latency is limited. Here, we delineate viral gene expression during natural HCMV persistent infection by analyzing the massive transcriptome RNA sequencing (RNA-seq) atlas generated by the Genotype-Tissue Expression (GTEx) project. This systematic analysis reveals that HCMV persistence in vivo is prevalent in diverse tissues. Notably, we find only viral transcripts that resemble gene expression during various stages of lytic infection with no evidence of any highly restricted latency-associated viral gene expression program. To further define the transcriptional landscape during HCMV latent infection, we also used single-cell RNA-seq and a tractable experimental latency model. In contrast to some current views on latency, we also find no evidence for any highly restricted latency-associated viral gene expression program. Instead, we reveal that latency-associated gene expression largely mirrors a late lytic viral program, albeit at much lower levels of expression. Overall, our work has the potential to revolutionize our understanding of HCMV persistence and suggests that latency is governed mainly by quantitative changes, with a limited number of qualitative changes, in viral gene expression. PMID:29535194
The bench scientist's guide to RNA-Seq analysis
USDA-ARS?s Scientific Manuscript database
RNA sequencing (RNA-Seq) is emerging as a highly accurate method to quantify transcript abundance. However, analyses of the large data sets obtained by sequencing the entire transcriptome of organisms have generally been performed by bioinformatic specialists. Here we outline a methods strategy desi...
Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv
2013-01-01
De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962
Regulating RNA polymerase pausing and transcription elongation in embryonic stem cells
Min, Irene M.; Waterfall, Joshua J.; Core, Leighton J.; Munroe, Robert J.; Schimenti, John; Lis, John T.
2011-01-01
Transitions between pluripotent stem cells and differentiated cells are executed by key transcription regulators. Comparative measurements of RNA polymerase distribution over the genome's primary transcription units in different cell states can identify the genes and steps in the transcription cycle that are regulated during such transitions. To identify the complete transcriptional profiles of RNA polymerases with high sensitivity and resolution, as well as the critical regulated steps upon which regulatory factors act, we used genome-wide nuclear run-on (GRO-seq) to map the density and orientation of transcriptionally engaged RNA polymerases in mouse embryonic stem cells (ESCs) and mouse embryonic fibroblasts (MEFs). In both cell types, progression of a promoter-proximal, paused RNA polymerase II (Pol II) into productive elongation is a rate-limiting step in transcription of ∼40% of mRNA-encoding genes. Importantly, quantitative comparisons between cell types reveal that transcription is controlled frequently at paused Pol II's entry into elongation. Furthermore, “bivalent” ESC genes (exhibiting both active and repressive histone modifications) bound by Polycomb group complexes PRC1 (Polycomb-repressive complex 1) and PRC2 show dramatically reduced levels of paused Pol II at promoters relative to an average gene. In contrast, bivalent promoters bound by only PRC2 allow Pol II pausing, but it is confined to extremely 5′ proximal regions. Altogether, these findings identify rate-limiting targets for transcription regulation during cell differentiation. PMID:21460038
APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data.
Ye, Congting; Long, Yuqi; Ji, Guoli; Li, Qingshun Quinn; Wu, Xiaohui
2018-06-01
Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Freely available for download at https://apatrap.sourceforge.io. liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary data are available at Bioinformatics online.
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export.
Karijolich, John; Zhao, Yang; Alla, Ravi; Glaunsinger, Britt
2017-06-02
Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA-RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome
USDA-ARS?s Scientific Manuscript database
A first analysis of the Glycine max (L.) Merr. (soybean) transcriptome using next generation sequencing technology and RNA-Sequencing (RNA-Seq) is presented. This analysis will provide an important resource for understanding transcription and gene co-regulatory networks in soybean, the most economic...
Analytical workflow profiling gene expression in murine macrophages
Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.
2015-01-01
Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305
Prunus transcription factors: breeding perspectives
Bianchi, Valmor J.; Rubio, Manuel; Trainotti, Livio; Verde, Ignazio; Bonghi, Claudio; Martínez-Gómez, Pedro
2015-01-01
Many plant processes depend on differential gene expression, which is generally controlled by complex proteins called transcription factors (TFs). In peach, 1533 TFs have been identified, accounting for about 5.5% of the 27,852 protein-coding genes. These TFs are the reference for the rest of the Prunus species. TF studies in Prunus have been performed on the gene expression analysis of different agronomic traits, including control of the flowering process, fruit quality, and biotic and abiotic stress resistance. These studies, using quantitative RT-PCR, have mainly been performed in peach, and to a lesser extent in other species, including almond, apricot, black cherry, Fuji cherry, Japanese apricot, plum, and sour and sweet cherry. Other tools have also been used in TF studies, including cDNA-AFLP, LC-ESI-MS, RNA, and DNA blotting or mapping. More recently, new tools assayed include microarray and high-throughput DNA sequencing (DNA-Seq) and RNA sequencing (RNA-Seq). New functional genomics opportunities include genome resequencing and the well-known synteny among Prunus genomes and transcriptomes. These new functional studies should be applied in breeding programs in the development of molecular markers. With the genome sequences available, some strategies that have been used in model systems (such as SNP genotyping assays and genotyping-by-sequencing) may be applicable in the functional analysis of Prunus TFs as well. In addition, the knowledge of the gene functions and position in the peach reference genome of the TFs represents an additional advantage. These facts could greatly facilitate the isolation of genes via QTL (quantitative trait loci) map-based cloning in the different Prunus species, following the association of these TFs with the identified QTLs using the peach reference genome. PMID:26124770
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Ho, Ming-Fen; Lummertz da Rocha, Edroaldo; Zhang, Cheng; Ingle, James N; Goss, Paul E; Shepherd, Lois E; Kubo, Michiaki; Wang, Liewei; Li, Hu; Weinshilboum, Richard M
2018-06-01
T-cell leukemia 1A ( TCL1A ) single-nucleotide polymorphisms (SNPs) have been associated with aromatase inhibitor-induced musculoskeletal adverse events. We previously demonstrated that TCL1A is inducible by estradiol (E 2 ) and plays a critical role in the regulation of cytokines, chemokines, and Toll-like receptors in a TCL1A SNP genotype and estrogen-dependent fashion. Furthermore, TCLIA SNP-dependent expression phenotypes can be "reversed" by exposure to selective estrogen receptor modulators such as 4-hydroxytamoxifen (4OH-TAM). The present study was designed to comprehensively characterize the role of TCL1A in transcriptional regulation across the genome by performing RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) assays with lymphoblastoid cell lines. RNA-seq identified 357 genes that were regulated in a TCL1A SNP- and E 2 -dependent fashion with expression patterns that were 4OH-TAM reversible. ChIP-seq for the same cells identified 57 TCL1A binding sites that could be regulated by E 2 in a SNP-dependent fashion. Even more striking, nuclear factor- κ B (NF- κ B) p65 bound to those same DNA regions. In summary, TCL1A is a novel transcription factor with expression that is regulated in a SNP- and E 2 -dependent fashion-a pattern of expression that can be reversed by 4OH-TAM. Integrated RNA-seq and ChIP-seq results suggest that TCL1A also acts as a transcriptional coregulator with NF- κ B p65, an important immune system transcription factor. Copyright © 2018 by The American Society for Pharmacology and Experimental Therapeutics.
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export
Zhao, Yang; Alla, Ravi
2017-01-01
Abstract Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA–RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. PMID:28334904
Zipper plot: visualizing transcriptional activity of genomic regions.
Avila Cobos, Francisco; Anckaert, Jasper; Volders, Pieter-Jan; Everaert, Celine; Rombaut, Dries; Vandesompele, Jo; De Preter, Katleen; Mestdagh, Pieter
2017-05-02
Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs. To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot. Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5'-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool.
Tu, Ying; Xu, Dan; Feng, Jiaqi; He, Li
2017-01-01
Sensitive skin (SS) is a condition of subjective cutaneous hyper-reactivity. The role of long non-coding RNAs (lncRNAs) in subjects with SS is unclear. Therefore, the aim of the present study was to provide a comprehensive profile of the mRNAs and lncRNAs in subjects with SS. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis presented the characteristics of associated protein-coding genes. In addition, a co-expression network of lncRNA and mRNA was constructed to identify potential underlying regulation targets; the results were verified by quantitative real-time PCR (qRT-PCR) and RNA-seq analyses in patients with SS and normal samples. Compared with the normal skin group, 266 novel lncRNAs and 6750 annotated lncRNAs were identified in the SS group. A total of 71 lncRNA transcripts and 2615 mRNA transcripts were differentially expressed (P < 0.05). The heat signature of the SS samples could be distinguished from the normal skin samples, whereas the majority of the genes that were present in enriched pathways were those that participated in focal adhesion, PI3K-Akt signaling, and cancer-related pathways. Five transcripts were selected for qRT-PCR analysis and the results were consistent with RNA-seq. The results suggested that LNC_000265 may play a role in the epidermal barrier structure of patient with SS. The data suggest novel genes and pathways that may be involved in the pathogenesis of SS and highlight potential targets that could be used for individualized treatment applications. PMID:29383128
Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian
2014-09-01
RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.
Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data
Kumar, Shailesh; Vo, Angie Duy; Qin, Fujun; Li, Hui
2016-01-01
RNA-Seq made possible the global identification of fusion transcripts, i.e. “chimeric RNAs”. Even though various software packages have been developed to serve this purpose, they behave differently in different datasets provided by different developers. It is important for both users, and developers to have an unbiased assessment of the performance of existing fusion detection tools. Toward this goal, we compared the performance of 12 well-known fusion detection software packages. We evaluated the sensitivity, false discovery rate, computing time, and memory usage of these tools in four different datasets (positive, negative, mixed, and test). We conclude that some tools are better than others in terms of sensitivity, positive prediction value, time consumption and memory usage. We also observed small overlaps of the fusions detected by different tools in the real dataset (test dataset). This could be due to false discoveries by various tools, but could also be due to the reason that none of the tools are inclusive. We have found that the performance of the tools depends on the quality, read length, and number of reads of the RNA-Seq data. We recommend that users choose the proper tools for their purpose based on the properties of their RNA-Seq data. PMID:26862001
Improving RNA-Seq expression estimates by correcting for fragment bias
2011-01-01
The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973
Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E
2013-04-08
Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.
RNA-seq Analysis of Early Hepatic Response to Handling and Confinement Stress in Rainbow Trout
Liu, Sixin; Gao, Guangtu; Palti, Yniv; Cleveland, Beth M.; Weber, Gregory M.; Rexroad, Caird E.
2014-01-01
Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs). Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R2 = 0.88). Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database. Accession Numbers Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881. Customized Perl Scripts All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author. PMID:24558395
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts
Paraskevopoulou, Maria D.; Vlachos, Ioannis S.; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G.
2016-01-01
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. PMID:26612864
Proteogenomic database construction driven from large scale RNA-seq data.
Woo, Sunghee; Cha, Seong Won; Merrihew, Gennifer; He, Yupeng; Castellana, Natalie; Guest, Clark; MacCoss, Michael; Bafna, Vineet
2014-01-03
The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.
Grape RNA-Seq analysis pipeline environment
Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic
2013-01-01
Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
Lu, Jun; Bushel, Pierre R.
2013-01-01
RNA sequencing (RNA-Seq) allows for the identification of novel exon-exon junctions and quantification of gene expression levels. We show that from RNA-Seq data one may also detect utilization of alternative polyadenylation (APA) in 3′ untranslated regions (3′ UTRs) known to play a critical role in the regulation of mRNA stability, cellular localization and translation efficiency. Given the dynamic nature of APA, it is desirable to examine the APA on a sample by sample basis. We used a Poisson hidden Markov model (PHMM) of RNA-Seq data to identify potential APA in human liver and brain cortex tissues leading to shortened 3′ UTRs. Over three hundred transcripts with shortened 3′ UTRs were detected with sensitivity >75% and specificity >60%. tissue-specific 3′ UTR shortening was observed for 32 genes with a q-value ≤ 0.1. When compared to alternative isoforms detected by Cufflinks or MISO, our PHMM method agreed on over 100 transcripts with shortened 3′ UTRs. Given the increasing usage of RNA-Seq for gene expression profiling, using PHMM to investigate sample-specific 3′ UTR shortening could be an added benefit from this emerging technology. PMID:23845781
Picardi, Ernesto; Gallo, Angela; Galeano, Federica; Tomaselli, Sara; Pesole, Graziano
2012-01-01
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual. PMID:22957051
Transcription start site associated RNAs (TSSaRNAs) are ubiquitous in all domains of life.
Zaramela, Livia S; Vêncio, Ricardo Z N; ten-Caten, Felipe; Baliga, Nitin S; Koide, Tie
2014-01-01
A plethora of non-coding RNAs has been discovered using high-resolution transcriptomics tools, indicating that transcriptional and post-transcriptional regulation is much more complex than previously appreciated. Small RNAs associated with transcription start sites of annotated coding regions (TSSaRNAs) are pervasive in both eukaryotes and bacteria. Here, we provide evidence for existence of TSSaRNAs in several archaeal transcriptomes including: Halobacterium salinarum, Pyrococcus furiosus, Methanococcus maripaludis, and Sulfolobus solfataricus. We validated TSSaRNAs from the model archaeon Halobacterium salinarum NRC-1 by deep sequencing two independent small-RNA enriched (RNA-seq) and a primary-transcript enriched (dRNA-seq) strand-specific libraries. We identified 652 transcripts, of which 179 were shown to be primary transcripts (∼7% of the annotated genome). Distinct growth-associated expression patterns between TSSaRNAs and their cognate genes were observed, indicating a possible role in environmental responses that may result from RNA polymerase with varying pausing rhythms. This work shows that TSSaRNAs are ubiquitous across all domains of life.
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.
2015-01-01
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053
Sudhagar, Arun; El-Matbouli, Mansour
2018-01-01
In recent years, with the advent of next-generation sequencing along with the development of various bioinformatics tools, RNA sequencing (RNA-Seq)-based transcriptome analysis has become much more affordable in the field of biological research. This technique has even opened up avenues to explore the transcriptome of non-model organisms for which a reference genome is not available. This has made fish health researchers march towards this technology to understand pathogenic processes and immune reactions in fish during the event of infection. Recent studies using this technology have altered and updated the previous understanding of many diseases in fish. RNA-Seq has been employed in the understanding of fish pathogens like bacteria, virus, parasites, and oomycetes. Also, it has been helpful in unraveling the immune mechanisms in fish. Additionally, RNA-Seq technology has made its way for future works, such as genetic linkage mapping, quantitative trait analysis, disease-resistant strain or broodstock selection, and the development of effective vaccines and therapies. Until now, there are no reviews that comprehensively summarize the studies which made use of RNA-Seq to explore the mechanisms of infection of pathogens and the defense strategies of fish hosts. This review aims to summarize the contemporary understanding and findings with regard to infectious pathogens and the immune system of fish that have been achieved through RNA-Seq technology. PMID:29342931
Zhang, Zijun; Xing, Yi
2017-09-19
Crosslinking or RNA immunoprecipitation followed by sequencing (CLIP-seq or RIP-seq) allows transcriptome-wide discovery of RNA regulatory sites. As CLIP-seq/RIP-seq reads are short, existing computational tools focus on uniquely mapped reads, while reads mapped to multiple loci are discarded. We present CLAM (CLIP-seq Analysis of Multi-mapped reads). CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads. The functional significance of these sites was demonstrated by consensus motif patterns and association with alternative splicing (splicing factors), transcript abundance (AGO2) and mRNA half-life (m6A). CLAM provides a useful tool to discover novel protein-RNA interactions and RNA modification sites from CLIP-seq and RIP-seq data, and reveals the significant contribution of repetitive elements to the RNA regulatory landscape of the human transcriptome. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bradford, James R.; Farren, Matthew; Powell, Steve J.; Runswick, Sarah; Weston, Susie L.; Brown, Helen; Delpuech, Oona; Wappett, Mark; Smith, Neil R.; Carr, T. Hedley; Dry, Jonathan R.; Gibson, Neil J.; Barry, Simon T.
2013-01-01
Pre-clinical models of tumour biology often rely on propagating human tumour cells in a mouse. In order to gain insight into the alignment of these models to human disease segments or investigate the effects of different therapeutics, approaches such as PCR or array based expression profiling are often employed despite suffering from biased transcript coverage, and a requirement for specialist experimental protocols to separate tumour and host signals. Here, we describe a computational strategy to profile transcript expression in both the tumour and host compartments of pre-clinical xenograft models from the same RNA sample using RNA-Seq. Key to this strategy is a species-specific mapping approach that removes the need for manipulation of the RNA population, customised sequencing protocols, or prior knowledge of the species component ratio. The method demonstrates comparable performance to species-specific RT-qPCR and a standard microarray platform, and allowed us to quantify gene expression changes in both the tumour and host tissue following treatment with cediranib, a potent vascular endothelial growth factor receptor tyrosine kinase inhibitor, including the reduction of multiple murine transcripts associated with endothelium or vessels, and an increase in genes associated with the inflammatory response in response to cediranib. In the human compartment, we observed a robust induction of hypoxia genes and a reduction in cell cycle associated transcripts. In conclusion, the study establishes that RNA-Seq can be applied to pre-clinical models to gain deeper understanding of model characteristics and compound mechanism of action, and to identify both tumour and host biomarkers. PMID:23840389
do Nascimento, Naíla C; Guimaraes, Ana M S; Dos Santos, Andrea P; Chu, Yuefeng; Marques, Lucas M; Messick, Joanne B
2018-06-18
Pigs are popular animal models in biomedical research. RNA-Seq is becoming the predominant tool to investigate transcriptional changes of the pig's response to infection. The high sensitivity of this tool requires a strict control of the study design beginning with the selection of healthy animals to provide accurate interpretation of research data. Pigs chronically infected with Mycoplasma suis often show no obvious clinical signs, however the infection may affect the validity of animal research. The goal of this study was to investigate whether or not this silent infection is also silent at the host transcriptional level. Therefore, immunocompetent pigs were experimentally infected with M. suis and transcriptional profiles of whole blood, generated by RNA-Seq, were analyzed and compared to non-infected animals. RNA-Seq showed 55 differentially expressed (DE) genes in the M. suis infected pigs. Down-regulation of genes related to innate immunity (tlr8, chemokines, chemokines receptors) and genes containing IFN gamma-activated sequence (gbp1, gbp2, il15, cxcl10, casp1, cd274) suggests a general suppression of the immune response in the infected animals. Sixteen (29.09%) of the DE genes were involved in two protein interaction networks: one involving chemokines, chemokine receptors and interleukin-15 and another involving the complement cascade. Genes related to vascular permeability, blood coagulation, and endothelium integrity were also DE in infected pigs. These findings suggest that M. suis subclinical infection causes significant alterations in blood mRNA levels, which could impact data interpretation of research using pigs. Screening of pigs for M. suis infection before initiating animal studies is strongly recommended.
Kandpal, Raj P; Rajasimha, Harsha K; Brooks, Matthew J; Nellissery, Jacob; Wan, Jun; Qian, Jiang; Kern, Timothy S; Swaroop, Anand
2012-01-01
To define gene expression changes associated with diabetic retinopathy in a mouse model using next generation sequencing, and to utilize transcriptome signatures to assess molecular pathways by which pharmacological agents inhibit diabetic retinopathy. We applied a high throughput RNA sequencing (RNA-seq) strategy using Illumina GAIIx to characterize the entire retinal transcriptome from nondiabetic and from streptozotocin-treated mice 32 weeks after induction of diabetes. Some of the diabetic mice were treated with inhibitors of receptor for advanced glycation endproducts (RAGE) and p38 mitogen activated protein (MAP) kinase, which have previously been shown to inhibit diabetic retinopathy in rodent models. The transcripts and alternatively spliced variants were determined in all experimental groups. Next generation sequencing-based RNA-seq profiles provided comprehensive signatures of transcripts that are altered in early stages of diabetic retinopathy. These transcripts encoded proteins involved in distinct yet physiologically relevant disease-associated pathways such as inflammation, microvasculature formation, apoptosis, glucose metabolism, Wnt signaling, xenobiotic metabolism, and photoreceptor biology. Significant upregulation of crystallin transcripts was observed in diabetic animals, and the diabetes-induced upregulation of these transcripts was inhibited in diabetic animals treated with inhibitors of either RAGE or p38 MAP kinase. These two therapies also showed dissimilar regulation of some subsets of transcripts that included alternatively spliced versions of arrestin, neutral sphingomyelinase activation associated factor (Nsmaf), SH3-domain GRB2-like interacting protein 1 (Sgip1), and axin. Diabetes alters many transcripts in the retina, and two therapies that inhibit the vascular pathology similarly inhibit a portion of these changes, pointing to possible molecular mechanisms for their beneficial effects. These therapies also changed the abundance of various alternatively spliced versions of signaling transcripts, suggesting a possible role of alternative splicing in disease etiology. Our studies clearly demonstrate RNA-seq as a comprehensive strategy for identifying disease-specific transcripts, and for determining comparative profiles of molecular changes mediated by candidate drugs.
Transcriptional and Chromatin Dynamics of Muscle Regeneration After Severe Trauma
2016-10-12
performed pathway analysis of the time-clustered RNA- Seq data16 and showed an initial burst of pro-inflammatory and immune-response transcripts in the...143 showed dynamic behavior (See Methods) and analysis of the dynamic miRNAs reinforced many of the results observed from the RNA-Seq datasets...excellent agreement was viewed. Hierarchical clustering of the datasets through time revealed 5 clusters, and gene ontology (GO) analysis of the
Gong, Ting; Szustakowski, Joseph D
2013-04-15
For heterogeneous tissues, measurements of gene expression through mRNA-Seq data are confounded by relative proportions of cell types involved. In this note, we introduce an efficient pipeline: DeconRNASeq, an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It adopts a globally optimized non-negative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next-generation sequencing data. We demonstrated the feasibility and validity of DeconRNASeq across a range of mixing levels and sources using mRNA-Seq data mixed in silico at known concentrations. We validated our computational approach for various benchmark data, with high correlation between our predicted cell proportions and the real fractions of tissues. Our study provides a rigorous, quantitative and high-resolution tool as a prerequisite to use mRNA-Seq data. The modularity of package design allows an easy deployment of custom analytical pipelines for data from other high-throughput platforms. DeconRNASeq is written in R, and is freely available at http://bioconductor.org/packages. Supplementary data are available at Bioinformatics online.
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
Polstein, Lauren R.; Perez-Pinera, Pablo; Kocak, D. Dewran; Vockley, Christopher M.; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E.; Reddy, Timothy E.; Gersbach, Charles A.
2015-01-01
Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. PMID:26025803
Polyester: simulating RNA-seq datasets with differential transcript expression.
Frazee, Alyssa C; Jaffe, Andrew E; Langmead, Ben; Leek, Jeffrey T
2015-09-01
Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. Polyester is freely available from Bioconductor (http://bioconductor.org/). jtleek@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts.
Paraskevopoulou, Maria D; Vlachos, Ioannis S; Karagkouni, Dimitra; Georgakilas, Georgios; Kanellos, Ilias; Vergoulis, Thanasis; Zagganas, Konstantinos; Tsanakas, Panayiotis; Floros, Evangelos; Dalamagas, Theodore; Hatzigeorgiou, Artemis G
2016-01-04
microRNAs (miRNAs) are short non-coding RNAs (ncRNAs) that act as post-transcriptional regulators of coding gene expression. Long non-coding RNAs (lncRNAs) have been recently reported to interact with miRNAs. The sponge-like function of lncRNAs introduces an extra layer of complexity in the miRNA interactome. DIANA-LncBase v1 provided a database of experimentally supported and in silico predicted miRNA Recognition Elements (MREs) on lncRNAs. The second version of LncBase (www.microrna.gr/LncBase) presents an extensive collection of miRNA:lncRNA interactions. The significantly enhanced database includes more than 70 000 low and high-throughput, (in)direct miRNA:lncRNA experimentally supported interactions, derived from manually curated publications and the analysis of 153 AGO CLIP-Seq libraries. The new experimental module presents a 14-fold increase compared to the previous release. LncBase v2 hosts in silico predicted miRNA targets on lncRNAs, identified with the DIANA-microT algorithm. The relevant module provides millions of predicted miRNA binding sites, accompanied with detailed metadata and MRE conservation metrics. LncBase v2 caters information regarding cell type specific miRNA:lncRNA regulation and enables users to easily identify interactions in 66 different cell types, spanning 36 tissues for human and mouse. Database entries are also supported by accurate lncRNA expression information, derived from the analysis of more than 6 billion RNA-Seq reads. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The bHLH transcription factor GmPIB1 facilitates resistance to Phytophthora sojae in Glycine max
Cheng, Qun; Dong, Lidong; Gao, Tianjiao; Liu, Tengfei; Li, Ninghui; Wang, Le; Chang, Xin; Wu, Junjiang; Xu, Pengfei
2018-01-01
Abstract Phytophthora sojae Kaufmann and Gerdemann causes Phytophthora root rot, a destructive soybean disease worldwide. A basic helix–loop–helix (bHLH) transcription factor is thought to be involved in the response to P. sojae infection in soybean, as revealed by RNA sequencing (RNA-seq). However, the molecular mechanism underlying this response is currently unclear. Here, we explored the function and underlying mechanisms of a bHLH transcription factor in soybean, designated GmPIB1 (P. sojae-inducible bHLH transcription factor), during host responses to P. sojae. GmPIB1 was significantly induced by P. sojae in the resistant soybean cultivar ‘L77-1863’. Analysis of transgenic soybean hairy roots with elevated or reduced expression of GmPIB1 demonstrated that GmPIB1 enhances resistance to P. sojae and reduces reactive oxygen species (ROS) accumulation. Quantitative reverse transcription PCR and chromatin immunoprecipitation–quantitative PCR assays revealed that GmPIB1 binds directly to the promoter of GmSPOD1 and represses its expression; this gene encodes a key enzyme in ROS production. Moreover, transgenic soybean hairy roots with GmSPOD1 silencing through RNA interference exhibited improved resistance to P. sojae and reduced ROS generation. These findings suggest that GmPIB1 enhances resistance to P. sojae by repressing the expression of GmSPOD1. PMID:29579245
CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.
Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E
2018-05-03
Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.
Genome-wide Discovery of Circular RNAs in the Leaf and Seedling Tissues of Arabidopsis Thaliana
Dou, Yongchao; Li, Shengjun; Yang, Weilong; Liu, Kan; Du, Qian; Ren, Guodong; Yu, Bin; Zhang, Chi
2017-01-01
Background: Recently, identification and functional studies of circular RNAs, a type of non-coding RNAs arising from a ligation of 3’ and 5’ ends of a linear RNA molecule, were conducted in mammalian cells with the development of RNA-seq technology. Method: Since compared with animals, studies on circular RNAs in plants are less thorough, a genome-wide identification of circular RNA candidates in Arabidopsis was conducted with our own developed bioinformatics tool to several existing RNA-seq datasets specifically for non-coding RNAs. Results: A total of 164 circular RNA candidates were identified from RNA-seq data, and 4 circular RNA transcripts, including both exonic and intronic circular RNAs, were experimentally validated. Interestingly, our results show that circular RNA transcripts are enriched in the photosynthesis system for the leaf tissue and correlated to the higher expression levels of their parent genes. Sixteen out of all 40 genes that have circular RNA candidates are related to the photosynthesis system, and out of the total 146 exonic circular RNA candidates, 63 are found in chloroplast. PMID:29081691
Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R
2015-01-01
Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.
Mao, Shihong; Goodrich, Robert J; Hauser, Russ; Schrader, Steven M; Chen, Zhen; Krawetz, Stephen A
2013-10-01
Different semen storage and sperm purification methods may affect the integrity of isolated spermatozoal RNA. RNA-Seq was applied to determine whether semen storage methods (pelleted vs. liquefied) and somatic cell lysis buffer (SCLB) vs. PureSperm (PS) purification methods affect the quantity and quality of sperm RNA. The results indicate that the method of semen storage does not markedly impact RNA profiling whereas the choice of purification can yield significant differences. RNA-Seq showed that the majority of mitochondrial and mid-piece associated transcripts were lost after SCLB purification, which indicated that the mid-piece of spermatozoa may have been compromised. In addition, the number of stable transcript pairs from SCLB-samples was less than that from the PS samples. This study supports the view that PS purification better maintains the integrity of spermatozoal RNAs.
How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives.
Dal Molin, Alessandra; Di Camillo, Barbara
2018-01-31
The sequencing of the transcriptome of single cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types in heterogeneous cell populations or for the study of stochastic gene expression. In recent years, various experimental methods and computational tools for analysing single-cell RNA-sequencing data have been proposed. However, most of them are tailored to different experimental designs or biological questions, and in many cases, their performance has not been benchmarked yet, thus increasing the difficulty for a researcher to choose the optimal single-cell transcriptome sequencing (scRNA-seq) experiment and analysis workflow. In this review, we aim to provide an overview of the current available experimental and computational methods developed to handle single-cell RNA-sequencing data and, based on their peculiarities, we suggest possible analysis frameworks depending on specific experimental designs. Together, we propose an evaluation of challenges and open questions and future perspectives in the field. In particular, we go through the different steps of scRNA-seq experimental protocols such as cell isolation, messenger RNA capture, reverse transcription, amplification and use of quantitative standards such as spike-ins and Unique Molecular Identifiers (UMIs). We then analyse the current methodological challenges related to preprocessing, alignment, quantification, normalization, batch effect correction and methods to control for confounding effects. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Tong, Ann-Jay; Kollmann, Tobias R.; Smale, Stephen T.
2015-01-01
A variety of age-related differences in the innate and adaptive immune systems have been proposed to contribute to the increased susceptibility to infection of human neonates and older adults. The emergence of RNA sequencing (RNA-seq) provides an opportunity to obtain an unbiased, comprehensive, and quantitative view of gene expression differences in defined cell types from different age groups. An examination of ex vivo human monocyte responses to lipopolysaccharide stimulation or Listeria monocytogenes infection by RNA-seq revealed extensive similarities between neonates, young adults, and older adults, with an unexpectedly small number of genes exhibiting statistically significant age-dependent differences. By examining the differentially induced genes in the context of transcription factor binding motifs and RNA-seq data sets from mutant mouse strains, a previously described deficiency in interferon response factor-3 activity could be implicated in most of the differences between newborns and young adults. Contrary to these observations, older adults exhibited elevated expression of inflammatory genes at baseline, yet the responses following stimulation correlated more closely with those observed in younger adults. Notably, major differences in the expression of constitutively expressed genes were not observed, suggesting that the age-related differences are driven by environmental influences rather than cell-autonomous differences in monocyte development. PMID:26147648
Genome-Wide Characterization of Light-Regulated Genes in Neurospora crassa
Wu, Cheng; Yang, Fei; Smith, Kristina M.; Peterson, Matthew; Dekhang, Rigzin; Zhang, Ying; Zucker, Jeremy; Bredeweg, Erin L.; Mallappa, Chandrashekara; Zhou, Xiaoying; Lyubetskaya, Anna; Townsend, Jeffrey P.; Galagan, James E.; Freitag, Michael; Dunlap, Jay C.; Bell-Pedersen, Deborah; Sachs, Matthew S.
2014-01-01
The filamentous fungus Neurospora crassa responds to light in complex ways. To thoroughly study the transcriptional response of this organism to light, RNA-seq was used to analyze capped and polyadenylated mRNA prepared from mycelium grown for 24 hr in the dark and then exposed to light for 0 (control) 15, 60, 120, and 240 min. More than three-quarters of all defined protein coding genes (79%) were expressed in these cells. The increased sensitivity of RNA-seq compared with previous microarray studies revealed that the RNA levels for 31% of expressed genes were affected two-fold or more by exposure to light. Additionally, a large class of mRNAs, enriched for transcripts specifying products involved in rRNA metabolism, showed decreased expression in response to light, indicating a heretofore undocumented effect of light on this pathway. Based on measured changes in mRNA levels, light generally increases cellular metabolism and at the same time causes significant oxidative stress to the organism. To deal with this stress, protective photopigments are made, antioxidants are produced, and genes involved in ribosome biogenesis are transiently repressed. PMID:25053707
SEASTAR: systematic evaluation of alternative transcription start sites in RNA.
Qin, Zhiyi; Stoilov, Peter; Zhang, Xuegong; Xing, Yi
2018-05-04
Alternative first exons diversify the transcriptomes of eukaryotes by producing variants of the 5' Untranslated Regions (5'UTRs) and N-terminal coding sequences. Accurate transcriptome-wide detection of alternative first exons typically requires specialized experimental approaches that are designed to identify the 5' ends of transcripts. We developed a computational pipeline SEASTAR that identifies first exons from RNA-seq data alone then quantifies and compares alternative first exon usage across multiple biological conditions. The exons inferred by SEASTAR coincide with transcription start sites identified directly by CAGE experiments and bear epigenetic hallmarks of active promoters. To determine if differential usage of alternative first exons can yield insights into the mechanism controlling gene expression, we applied SEASTAR to an RNA-seq dataset that tracked the reprogramming of mouse fibroblasts into induced pluripotent stem cells. We observed dynamic temporal changes in the usage of alternative first exons, along with correlated changes in transcription factor expression. Using a combined sequence motif and gene set enrichment analysis we identified N-Myc as a regulator of alternative first exon usage in the pluripotent state. Our results demonstrate that SEASTAR can leverage the available RNA-seq data to gain insights into the control of gene expression and alternative transcript variation in eukaryotic transcriptomes.
Nepal, Chirag; Coolen, Marion; Hadzhiev, Yavor; Cussigh, Delphine; Mydel, Piotr; Steen, Vidar M.; Carninci, Piero; Andersen, Jesper B.; Bally-Cuif, Laure; Müller, Ferenc; Lenhard, Boris
2016-01-01
MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, especially in development and differentiation. Our understanding about the transcriptional regulation of miRNA genes is limited by inadequate annotation of primary miRNA (pri-miRNA) transcripts. Here, we used CAGE-seq and RNA-seq to provide genome-wide identification of the pri-miRNA core promoter repertoire and its dynamic usage during zebrafish embryogenesis. We assigned pri-miRNA promoters to 152 precursor-miRNAs (pre-miRNAs), the majority of which were supported by promoter associated post-translational histone modifications (H3K4me3, H2A.Z) and RNA polymerase II (RNAPII) occupancy. We validated seven miR-9 pri-miRNAs by in situ hybridization and showed similar expression patterns as mature miR-9. In addition, processing of an alternative intronic promoter of miR-9–5 was validated by 5′ RACE PCR. Developmental profiling revealed a subset of pri-miRNAs that are maternally inherited. Moreover, we show that promoter-associated H3K4me3, H2A.Z and RNAPII marks are not only present at pri-miRNA promoters but are also specifically enriched at pre-miRNAs, suggesting chromatin level regulation of pre-miRNAs. Furthermore, we demonstrated that CAGE-seq also detects 3′-end processing of pre-miRNAs on Drosha cleavage site that correlates with miRNA-offset RNAs (moRNAs) production and provides a new tool for detecting Drosha processing events and predicting pre-miRNA processing by a genome-wide assay. PMID:26673698
USDA-ARS?s Scientific Manuscript database
The transcriptome provides a functional footprint of the genome by enumerating the molecular components of cells and tissues. The field of transcript discovery has been revolutionized through high-throughput mRNA sequencing (RNA-seq). Here, we present a methodology that replicates and improves exist...
Poss, Zachary C; Ebmeier, Christopher C; Odell, Aaron T; Tangpeerachaikul, Anupong; Lee, Thomas; Pelish, Henry E; Shair, Matthew D; Dowell, Robin D; Old, William M; Taatjes, Dylan J
2016-04-12
Cortistatin A (CA) is a highly selective inhibitor of the Mediator kinases CDK8 and CDK19. Using CA, we now report a large-scale identification of Mediator kinase substrates in human cells (HCT116). We identified over 16,000 quantified phosphosites including 78 high-confidence Mediator kinase targets within 64 proteins, including DNA-binding transcription factors and proteins associated with chromatin, DNA repair, and RNA polymerase II. Although RNA-seq data correlated with Mediator kinase targets, the effects of CA on gene expression were limited and distinct from CDK8 or CDK19 knockdown. Quantitative proteome analyses, tracking around 7,000 proteins across six time points (0-24 hr), revealed that CA selectively affected pathways implicated in inflammation, growth, and metabolic regulation. Contrary to expectations, increased turnover of Mediator kinase targets was not generally observed. Collectively, these data support Mediator kinases as regulators of chromatin and RNA polymerase II activity and suggest their roles extend beyond transcription to metabolism and DNA repair. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Li, Jun; Tibshirani, Robert
2015-01-01
We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or ‘sequencing depths’. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by ‘outliers’ in the data. We introduce a simple, nonparametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods. PMID:22127579
Deng, Yue; Bao, Feng; Yang, Yang; Ji, Xiangyang; Du, Mulong; Zhang, Zhengdong
2017-01-01
Abstract The automated transcript discovery and quantification of high-throughput RNA sequencing (RNA-seq) data are important tasks of next-generation sequencing (NGS) research. However, these tasks are challenging due to the uncertainties that arise in the inference of complete splicing isoform variants from partially observed short reads. Here, we address this problem by explicitly reducing the inherent uncertainties in a biological system caused by missing information. In our approach, the RNA-seq procedure for transforming transcripts into short reads is considered an information transmission process. Consequently, the data uncertainties are substantially reduced by exploiting the information transduction capacity of information theory. The experimental results obtained from the analyses of simulated datasets and RNA-seq datasets from cell lines and tissues demonstrate the advantages of our method over state-of-the-art competitors. Our algorithm is an open-source implementation of MaxInfo. PMID:28911101
Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.
Davidson, Nadia M; Oshlack, Alicia
2018-05-01
RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.
Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J; Pho, Mylan; Dei Rossi, Andrew; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre
2012-01-01
RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts.
Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J.; Pho, Mylan; Rossi, Andrew Dei; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre
2012-01-01
RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts. PMID:22808097
Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo
Ritchey, Laura E.; Su, Zhao; Tang, Yin; Tack, David C.
2017-01-01
Abstract RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. PMID:28637286
Accurate identification of RNA editing sites from primitive sequence with deep neural networks.
Ouyang, Zhangyi; Liu, Feng; Zhao, Chenghui; Ren, Chao; An, Gaole; Mei, Chuan; Bo, Xiaochen; Shu, Wenjie
2018-04-16
RNA editing is a post-transcriptional RNA sequence alteration. Current methods have identified editing sites and facilitated research but require sufficient genomic annotations and prior-knowledge-based filtering steps, resulting in a cumbersome, time-consuming identification process. Moreover, these methods have limited generalizability and applicability in species with insufficient genomic annotations or in conditions of limited prior knowledge. We developed DeepRed, a deep learning-based method that identifies RNA editing from primitive RNA sequences without prior-knowledge-based filtering steps or genomic annotations. DeepRed achieved 98.1% and 97.9% area under the curve (AUC) in training and test sets, respectively. We further validated DeepRed using experimentally verified U87 cell RNA-seq data, achieving 97.9% positive predictive value (PPV). We demonstrated that DeepRed offers better prediction accuracy and computational efficiency than current methods with large-scale, mass RNA-seq data. We used DeepRed to assess the impact of multiple factors on editing identification with RNA-seq data from the Association of Biomolecular Resource Facilities and Sequencing Quality Control projects. We explored developmental RNA editing pattern changes during human early embryogenesis and evolutionary patterns in Drosophila species and the primate lineage using DeepRed. Our work illustrates DeepRed's state-of-the-art performance; it may decipher the hidden principles behind RNA editing, making editing detection convenient and effective.
Liu, Ruolin; Dickerson, Julie
2017-11-01
We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.
Haas, Brian J; Papanicolaou, Alexie; Yassour, Moran; Grabherr, Manfred; Blood, Philip D; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N; Henschel, Robert; LeDuc, Richard D; Friedman, Nir; Regev, Aviv
2013-08-01
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi
2014-12-23
Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.
Choi, Sun Young; Park, Byeonghyeok; Choi, In-Geol; Sim, Sang Jun; Lee, Sun-Mi; Um, Youngsoon; Woo, Han Min
2016-01-01
The development of high-throughput technology using RNA-seq has allowed understanding of cellular mechanisms and regulations of bacterial transcription. In addition, transcriptome analysis with RNA-seq has been used to accelerate strain improvement through systems metabolic engineering. Synechococcus elongatus PCC 7942, a photosynthetic bacterium, has remarkable potential for biochemical and biofuel production due to photoautotrophic cell growth and direct CO2 conversion. Here, we performed a transcriptome analysis of S. elongatus PCC 7942 using RNA-seq to understand the changes of cellular metabolism and regulation for nitrogen starvation responses. As a result, differentially expressed genes (DEGs) were identified and functionally categorized. With mapping onto metabolic pathways, we probed transcriptional perturbation and regulation of carbon and nitrogen metabolisms relating to nitrogen starvation responses. Experimental evidence such as chlorophyll a and phycobilisome content and the measurement of CO2 uptake rate validated the transcriptome analysis. The analysis suggests that S. elongatus PCC 7942 reacts to nitrogen starvation by not only rearranging the cellular transport capacity involved in carbon and nitrogen assimilation pathways but also by reducing protein synthesis and photosynthesis activities. PMID:27488818
Takahashi, Melissa K.; Watters, Kyle E.; Gasper, Paul M.; Abbott, Timothy R.; Carlson, Paul D.; Chen, Alan A.
2016-01-01
Antisense RNA-mediated transcriptional regulators are powerful tools for controlling gene expression and creating synthetic gene networks. RNA transcriptional repressors derived from natural mechanisms called attenuators are particularly versatile, though their mechanistic complexity has made them difficult to engineer. Here we identify a new structure–function design principle for attenuators that enables the forward engineering of new RNA transcriptional repressors. Using in-cell SHAPE-Seq to characterize the structures of attenuator variants within Escherichia coli, we show that attenuator hairpins that facilitate interaction with antisense RNAs require interior loops for proper function. Molecular dynamics simulations of these attenuator variants suggest these interior loops impart structural flexibility. We further observe hairpin flexibility in the cellular structures of natural RNA mechanisms that use antisense RNA interactions to repress translation, confirming earlier results from in vitro studies. Finally, we design new transcriptional attenuators in silico using an interior loop as a structural requirement and show that they function as desired in vivo. This work establishes interior loops as an important structural element for designing synthetic RNA gene regulators. We anticipate that the coupling of experimental measurement of cellular RNA structure and function with computational modeling will enable rapid discovery of structure–function design principles for a diverse array of natural and synthetic RNA regulators. PMID:27103533
Lai, Tongfei; Wang, Ying; Fan, Yaya; Zhou, Yingying; Bao, Ying; Zhou, Ting
2017-03-06
In this study, the effects of exogenous potassium phosphite (Phi) on growth and patulin production of postharvest pathogen Penicillium expansum were assessed. The results indicated that P. expansum under 5mmol/L Phi stress presented obvious development retardation, yield reduction of patulin and lower infectivity to apple fruit. Meanwhile, expression analysis of 15 genes related to patulin biosynthesis suggested that Phi mainly affected the early steps of patulin synthetic route at transcriptional level. Furthermore, a global view of proteome and transcriptome alteration of P. expansum spores during 6h of Phi stress was evaluated by iTRAQ (isobaric tags for relative and absolute quantitation) and RNA-seq (RNA sequencing) approaches. A total of 582 differentially expressed proteins (DEPs) and 177 differentially expressed genes (DEGs) were acquired, most of which participated in carbohydrate metabolism, amino acid metabolism, lipid metabolism, genetic information processing and biosynthesis of secondary metabolites. Finally, 39 overlapped candidates were screened out through correlational analysis between iTRAQ and RNA-seq datasets. These findings will afford more precise and directional clues to explore the inhibitory mechanism of Phi on growth and patulin biosynthesis of P. expansum, and be beneficial to develop effective controlling approaches based on Phi. Copyright © 2016 Elsevier B.V. All rights reserved.
Single-nucleus RNA-seq of differentiating human myoblasts reveals the extent of fate heterogeneity
Zeng, Weihua; Jiang, Shan; Kong, Xiangduo; El-Ali, Nicole; Ball, Alexander R.; Ma, Christopher I-Hsing; Hashimoto, Naohiro; Yokomori, Kyoko; Mortazavi, Ali
2016-01-01
Myoblasts are precursor skeletal muscle cells that differentiate into fused, multinucleated myotubes. Current single-cell microfluidic methods are not optimized for capturing very large, multinucleated cells such as myotubes. To circumvent the problem, we performed single-nucleus transcriptome analysis. Using immortalized human myoblasts, we performed RNA-seq analysis of single cells (scRNA-seq) and single nuclei (snRNA-seq) and found them comparable, with a distinct enrichment for long non-coding RNAs (lncRNAs) in snRNA-seq. We then compared snRNA-seq of myoblasts before and after differentiation. We observed the presence of mononucleated cells (MNCs) that remained unfused and analyzed separately from multi-nucleated myotubes. We found that while the transcriptome profiles of myoblast and myotube nuclei are relatively homogeneous, MNC nuclei exhibited significant heterogeneity, with the majority of them adopting a distinct mesenchymal state. Primary transcripts for microRNAs (miRNAs) that participate in skeletal muscle differentiation were among the most differentially expressed lncRNAs, which we validated using NanoString. Our study demonstrates that snRNA-seq provides reliable transcriptome quantification for cells that are otherwise not amenable to current single-cell platforms. Our results further indicate that snRNA-seq has unique advantage in capturing nucleus-enriched lncRNAs and miRNA precursors that are useful in mapping and monitoring differential miRNA expression during cellular differentiation. PMID:27566152
Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J
2018-05-29
Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.
Wu, Yongyan; Zhang, Yuliang; Niu, Min; Shi, Yong; Liu, Hongliang; Yang, Dongli; Li, Fei; Lu, Yan; Bo, Yunfeng; Zhang, Ruiping; Li, Zhenyu; Luo, Hongjie; Cui, Jiajia; Sang, Jiangwei; Xiang, Caixia; Gao, Wei; Wen, Shuxin
2018-06-27
CD133+CD44+ cancer stem cells previously isolated from laryngeal squamous cell carcinoma (LSCC) cell lines showed strong malignancy and tumorigenicity. However, the molecular mechanism underlying the enhanced malignancy remained unclear. Cell proliferation assay, spheroid-formation experiment, RNA sequencing (RNA-seq), miRNA-seq, bioinformatic analysis, quantitative real-time PCR, migration assay, invasion assay, and luciferase reporter assay were used to identify differentially expressed mRNAs, lncRNAs, circRNAs and miRNAs, construct transcription regulatory network, and investigate functional roles and mechanism of circRNA in CD133+CD44+ laryngeal cancer stem cells. Differentially expressed genes in TDP cells were mainly enriched in the biological processes of cell differentiation, regulation of autophagy, negative regulation of cell death, regulation of cell growth, response to hypoxia, telomere maintenance, cellular response to gamma radiation, and regulation of apoptotic signaling, which are closely related to the malignant features of tumor cells. We constructed the regulatory network of differentially expressed circRNAs, miRNAs and mRNAs. qPCR findings for the expression of key genes in the network were consistent with the sequencing data. Moreover, our data revealed that circRNA hg19_circ_0005033 promotes proliferation, migration, invasion, and chemotherapy resistance of laryngeal cancer stem cells. This study provides potential biomarkers and targets for LSCC diagnosis and therapy, and provide important evidences for the heterogeneity of LSCC cells at the transcription level. © 2018 The Author(s). Published by S. Karger AG, Basel.
RNA Editing During Sexual Development Occurs in Distantly Related Filamentous Ascomycetes
Teichert, Ines; Dahlmann, Tim A.; Kück, Ulrich
2017-01-01
RNA editing is a post-transcriptional process that modifies RNA molecules leading to transcript sequences that differ from their template DNA. A-to-I editing was found to be widely distributed in nuclear transcripts of metazoa, but was detected in fungi only recently in a study of the filamentous ascomycete Fusarium graminearum that revealed extensive A-to-I editing of mRNAs in sexual structures (fruiting bodies). Here, we searched for putative RNA editing events in RNA-seq data from Sordaria macrospora and Pyronema confluens, two distantly related filamentous ascomycetes, and in data from the Taphrinomycete Schizosaccharomyces pombe. Like F. graminearum, S. macrospora is a member of the Sordariomycetes, whereas P. confluens belongs to the early-diverging group of Pezizomycetes. We found extensive A-to-I editing in RNA-seq data from sexual mycelium from both filamentous ascomycetes, but not in vegetative structures. A-to-I editing was not detected in different stages of meiosis of S. pombe. A comparison of A-to-I editing in S. macrospora with F. graminearum and P. confluens, respectively, revealed little conservation of individual editing sites. An analysis of RNA-seq data from two sterile developmental mutants of S. macrospora showed that A-to-I editing is strongly reduced in these strains. Sequencing of cDNA fragments containing more than one editing site from P. confluens showed that at the beginning of sexual development, transcripts were incompletely edited or unedited, whereas in later stages transcripts were more extensively edited. Taken together, these data suggest that A-to-I RNA editing is an evolutionary conserved feature during fruiting body development in filamentous ascomycetes. PMID:28338982
RNA Editing During Sexual Development Occurs in Distantly Related Filamentous Ascomycetes.
Teichert, Ines; Dahlmann, Tim A; Kück, Ulrich; Nowrousian, Minou
2017-04-01
RNA editing is a post-transcriptional process that modifies RNA molecules leading to transcript sequences that differ from their template DNA. A-to-I editing was found to be widely distributed in nuclear transcripts of metazoa, but was detected in fungi only recently in a study of the filamentous ascomycete Fusarium graminearum that revealed extensive A-to-I editing of mRNAs in sexual structures (fruiting bodies). Here, we searched for putative RNA editing events in RNA-seq data from Sordaria macrospora and Pyronema confluens, two distantly related filamentous ascomycetes, and in data from the Taphrinomycete Schizosaccharomyces pombe. Like F. graminearum, S. macrospora is a member of the Sordariomycetes, whereas P. confluens belongs to the early-diverging group of Pezizomycetes. We found extensive A-to-I editing in RNA-seq data from sexual mycelium from both filamentous ascomycetes, but not in vegetative structures. A-to-I editing was not detected in different stages of meiosis of S. pombe. A comparison of A-to-I editing in S. macrospora with F. graminearum and P. confluens, respectively, revealed little conservation of individual editing sites. An analysis of RNA-seq data from two sterile developmental mutants of S. macrospora showed that A-to-I editing is strongly reduced in these strains. Sequencing of cDNA fragments containing more than one editing site from P. confluens showed that at the beginning of sexual development, transcripts were incompletely edited or unedited, whereas in later stages transcripts were more extensively edited. Taken together, these data suggest that A-to-I RNA editing is an evolutionary conserved feature during fruiting body development in filamentous ascomycetes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Krause, Sue A; Pandit, Aniruddha; Davies, Shireen A
2018-01-01
Abstract FlyAtlas 2 (www.flyatlas2.org) is part successor, part complement to the FlyAtlas database and web application for studying the expression of the genes of Drosophila melanogaster in different tissues of adults and larvae. Although generated in the same lab with the same fly line raised on the same diet as FlyAtlas, the FlyAtlas2 resource employs a completely new set of expression data based on RNA-Seq, rather than microarray analysis, and so it allows the user to obtain information for the expression of different transcripts of a gene. Furthermore, the data for somatic tissues are now available for both male and female adult flies, allowing studies of sexual dimorphism. Gene coverage has been extended by the inclusion of microRNAs and many of the RNA genes included in Release 6 of the Drosophila reference genome. The web interface has been modified to accommodate the extra data, but at the same time has been adapted for viewing on small mobile devices. Users also have access to the RNA-Seq reads displayed alongside the annotated Drosophila genome in the (external) UCSC browser, and are able to link out to the previous FlyAtlas resource to compare the data obtained by RNA-Seq with that obtained using microarrays. PMID:29069479
Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo
2014-01-01
Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I–II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I–II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions. PMID:24691066
Gallardo-Escárate, Cristian; Valenzuela-Muñoz, Valentina; Nuñez-Acuña, Gustavo
2014-01-01
Despite the economic and environmental impacts that sea lice infestations have on salmon farming worldwide, genomic data generated by high-throughput transcriptome sequencing for different developmental stages, sexes, and strains of sea lice is still limited or unknown. In this study, RNA-seq analysis was performed using de novo transcriptome assembly as a reference for evidenced transcriptional changes from six developmental stages of the salmon louse Caligus rogercresseyi. EST-datasets were generated from the nauplius I, nauplius II, copepodid and chalimus stages and from female and male adults using MiSeq Illumina sequencing. A total of 151,788,682 transcripts were yielded, which were assembled into 83,444 high quality contigs and subsequently annotated into roughly 24,000 genes based on known proteins. To identify differential transcription patterns among salmon louse stages, cluster analyses were performed using normalized gene expression values. Herein, four clusters were differentially expressed between nauplius I-II and copepodid stages (604 transcripts), five clusters between copepodid and chalimus stages (2,426 transcripts), and six clusters between female and male adults (2,478 transcripts). Gene ontology analysis revealed that the nauplius I-II, copepodid and chalimus stages are mainly annotated to aminoacid transfer/repair/breakdown, metabolism, molting cycle, and nervous system development. Additionally, genes showing differential transcription in female and male adults were highly related to cytoskeletal and contractile elements, reproduction, cell development, morphogenesis, and transcription-translation processes. The data presented in this study provides the most comprehensive transcriptome resource available for C. rogercresseyi, which should be used for future genomic studies linked to host-parasite interactions.
Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji
2018-01-01
Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738
Polstein, Lauren R; Perez-Pinera, Pablo; Kocak, D Dewran; Vockley, Christopher M; Bledsoe, Peggy; Song, Lingyun; Safi, Alexias; Crawford, Gregory E; Reddy, Timothy E; Gersbach, Charles A
2015-08-01
Genome engineering technologies based on the CRISPR/Cas9 and TALE systems are enabling new approaches in science and biotechnology. However, the specificity of these tools in complex genomes and the role of chromatin structure in determining DNA binding are not well understood. We analyzed the genome-wide effects of TALE- and CRISPR-based transcriptional activators in human cells using ChIP-seq to assess DNA-binding specificity and RNA-seq to measure the specificity of perturbing the transcriptome. Additionally, DNase-seq was used to assess genome-wide chromatin remodeling that occurs as a result of their action. Our results show that these transcription factors are highly specific in both DNA binding and gene regulation and are able to open targeted regions of closed chromatin independent of gene activation. Collectively, these results underscore the potential for these technologies to make precise changes to gene expression for gene and cell therapies or fundamental studies of gene function. © 2015 Polstein et al.; Published by Cold Spring Harbor Laboratory Press.
Polycomb repressive complex 1 modifies transcription of active genes
Pherson, Michelle; Misulovin, Ziva; Gause, Maria; Mihindukulasuriya, Kathie; Swain, Amanda; Dorsett, Dale
2017-01-01
This study examines the role of Polycomb repressive complex 1 (PRC1) at active genes. The PRC1 and PRC2 complexes are crucial for epigenetic silencing during development of an organism. They are recruited to Polycomb response elements (PREs) and establish silenced domains over several kilobases. Recent studies show that PRC1 is also directly recruited to active genes by the cohesin complex. Cohesin participates broadly in control of gene transcription, but it is unknown whether cohesin-recruited PRC1 also plays a role in transcriptional control of active genes. We address this question using genome-wide RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq). The results show that PRC1 influences transcription of active genes, and a significant fraction of its effects are likely direct. The roles of different PRC1 subunits can also vary depending on the gene. Depletion of PRC1 subunits by RNA interference alters phosphorylation of RNA polymerase II (Pol II) and occupancy by the Spt5 pausing-elongation factor at most active genes. These effects on Pol II phosphorylation and Spt5 are likely linked to changes in elongation and RNA processing detected by nascent RNA-seq, although the mechanisms remain unresolved. The experiments also reveal that PRC1 facilitates association of Spt5 with enhancers and PREs. Reduced Spt5 levels at these regulatory sequences upon PRC1 depletion coincide with changes in Pol II occupancy and phosphorylation. Our findings indicate that, in addition to its repressive roles in epigenetic gene silencing, PRC1 broadly influences transcription of active genes and may suppress transcription of nonpromoter regulatory sequences. PMID:28782042
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Jorjani, Hadi; Zavolan, Mihaela
2014-04-01
Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php
Single-cell transcriptional dynamics of flavivirus infection
Bekerman, Elena
2018-01-01
Dengue and Zika viral infections affect millions of people annually and can be complicated by hemorrhage and shock or neurological manifestations, respectively. However, a thorough understanding of the host response to these viruses is lacking, partly because conventional approaches ignore heterogeneity in virus abundance across cells. We present viscRNA-Seq (virus-inclusive single cell RNA-Seq), an approach to probe the host transcriptome together with intracellular viral RNA at the single cell level. We applied viscRNA-Seq to monitor dengue and Zika virus infection in cultured cells and discovered extreme heterogeneity in virus abundance. We exploited this variation to identify host factors that show complex dynamics and a high degree of specificity for either virus, including proteins involved in the endoplasmic reticulum translocon, signal peptide processing, and membrane trafficking. We validated the viscRNA-Seq hits and discovered novel proviral and antiviral factors. viscRNA-Seq is a powerful approach to assess the genome-wide virus-host dynamics at single cell level. PMID:29451494
Hurley, Jennifer M.; Dasgupta, Arko; Emerson, Jillian M.; Zhou, Xiaoying; Ringelberg, Carol S.; Knabe, Nicole; Lipzen, Anna M.; Lindquist, Erika A.; Daum, Christopher G.; Barry, Kerrie W.; Grigoriev, Igor V.; Smith, Kristina M.; Galagan, James E.; Bell-Pedersen, Deborah; Freitag, Michael; Cheng, Chao; Loros, Jennifer J.; Dunlap, Jay C.
2014-01-01
Neurospora crassa has been for decades a principal model for filamentous fungal genetics and physiology as well as for understanding the mechanism of circadian clocks. Eukaryotic fungal and animal clocks comprise transcription-translation–based feedback loops that control rhythmic transcription of a substantial fraction of these transcriptomes, yielding the changes in protein abundance that mediate circadian regulation of physiology and metabolism: Understanding circadian control of gene expression is key to understanding eukaryotic, including fungal, physiology. Indeed, the isolation of clock-controlled genes (ccgs) was pioneered in Neurospora where circadian output begins with binding of the core circadian transcription factor WCC to a subset of ccg promoters, including those of many transcription factors. High temporal resolution (2-h) sampling over 48 h using RNA sequencing (RNA-Seq) identified circadianly expressed genes in Neurospora, revealing that from ∼10% to as much 40% of the transcriptome can be expressed under circadian control. Functional classifications of these genes revealed strong enrichment in pathways involving metabolism, protein synthesis, and stress responses; in broad terms, daytime metabolic potential favors catabolism, energy production, and precursor assembly, whereas night activities favor biosynthesis of cellular components and growth. Discriminative regular expression motif elicitation (DREME) identified key promoter motifs highly correlated with the temporal regulation of ccgs. Correlations between ccg abundance from RNA-Seq, the degree of ccg-promoter activation as reported by ccg-promoter–luciferase fusions, and binding of WCC as measured by ChIP-Seq, are not strong. Therefore, although circadian activation is critical to ccg rhythmicity, posttranscriptional regulation plays a major role in determining rhythmicity at the mRNA level. PMID:25362047
Thomason, Maureen K.; Bischler, Thorsten; Eisenbart, Sara K.; Förstner, Konrad U.; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay
2014-01-01
While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. PMID:25266388
Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.
Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John
2017-10-03
Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .
Recovery of high-quality RNA from laser capture microdissected human and rodent pancreas.
Butler, Alexandra E; Matveyenko, Aleksey V; Kirakossian, David; Park, Johanna; Gurlo, Tatyana; Butler, Peter C
Laser capture microdissection (LCM) is a powerful method to isolate specific populations of cells for subsequent analysis such as gene expression profiling, for example, microarrays or ribonucleic (RNA)-Seq. This technique has been applied to frozen as well as formalin-fixed, paraffin-embedded (FFPE) specimens with variable outcomes regarding quality and quantity of extracted RNA. The goal of the study was to develop the methods to isolate high-quality RNA from islets of Langerhans and pancreatic duct glands (PDG) isolated by LCM. We report an optimized protocol for frozen sections to minimize RNA degradation and maximize recovery of expected transcripts from the samples using quantitative real-time polymerase chain reaction (RT-PCR) by adding RNase inhibitors at multiple steps during the experiment. This technique reproducibly delivered intact RNA (RIN values 6-7). Using quantitative RT-PCR, the expected profiles of insulin, glucagon, mucin6 (Muc6), and cytokeratin-19 (CK-19) mRNA in PDGs and pancreatic islets were detected. The described experimental protocol for frozen pancreas tissue might also be useful for other tissues with moderate to high levels of intrinsic ribonuclease (RNase) activity.
Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona
2014-01-01
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478
Takahashi, Melissa K; Watters, Kyle E; Gasper, Paul M; Abbott, Timothy R; Carlson, Paul D; Chen, Alan A; Lucks, Julius B
2016-06-01
Antisense RNA-mediated transcriptional regulators are powerful tools for controlling gene expression and creating synthetic gene networks. RNA transcriptional repressors derived from natural mechanisms called attenuators are particularly versatile, though their mechanistic complexity has made them difficult to engineer. Here we identify a new structure-function design principle for attenuators that enables the forward engineering of new RNA transcriptional repressors. Using in-cell SHAPE-Seq to characterize the structures of attenuator variants within Escherichia coli, we show that attenuator hairpins that facilitate interaction with antisense RNAs require interior loops for proper function. Molecular dynamics simulations of these attenuator variants suggest these interior loops impart structural flexibility. We further observe hairpin flexibility in the cellular structures of natural RNA mechanisms that use antisense RNA interactions to repress translation, confirming earlier results from in vitro studies. Finally, we design new transcriptional attenuators in silico using an interior loop as a structural requirement and show that they function as desired in vivo. This work establishes interior loops as an important structural element for designing synthetic RNA gene regulators. We anticipate that the coupling of experimental measurement of cellular RNA structure and function with computational modeling will enable rapid discovery of structure-function design principles for a diverse array of natural and synthetic RNA regulators. © 2016 Takahashi et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.
Raghav, Sunil Kumar; Deplancke, Bart
2012-01-01
Chromatin immunoprecipitation (ChIP) is a commonly used technique to detect the in vivo binding of proteins to DNA. ChIP is now routinely paired to microarray analysis (ChIP-chip) or next-generation sequencing (ChIP-Seq) to profile the DNA occupancy of proteins of interest on a genome-wide level. Because ChIP-chip introduces several biases, most notably due to the use of a fixed number of probes, ChIP-Seq has quickly become the method of choice as, depending on the sequencing depth, it is more sensitive, quantitative, and provides a greater binding site location resolution. With the ever increasing number of reads that can be generated per sequencing run, it has now become possible to analyze several samples simultaneously while maintaining sufficient sequence coverage, thus significantly reducing the cost per ChIP-Seq experiment. In this chapter, we provide a step-by-step guide on how to perform multiplexed ChIP-Seq analyses. As a proof-of-concept, we focus on the genome-wide profiling of RNA Polymerase II as measuring its DNA occupancy at different stages of any biological process can provide insights into the gene regulatory mechanisms involved. However, the protocol can also be used to perform multiplexed ChIP-Seq analyses of other DNA-binding proteins such as chromatin modifiers and transcription factors.
A Guide for Designing and Analyzing RNA-Seq Data.
Chatterjee, Aniruddha; Ahn, Antonio; Rodger, Euan J; Stockwell, Peter A; Eccles, Michael R
2018-01-01
The identity of a cell or an organism is at least in part defined by its gene expression and therefore analyzing gene expression remains one of the most frequently performed experimental techniques in molecular biology. The development of the RNA-Sequencing (RNA-Seq) method allows an unprecedented opportunity to analyze expression of protein-coding, noncoding RNA and also de novo transcript assembly of a new species or organism. However, the planning and design of RNA-Seq experiments has important implications for addressing the desired biological question and maximizing the value of the data obtained. In addition, RNA-Seq generates a huge volume of data and accurate analysis of this data involves several different steps and choices of tools. This can be challenging and overwhelming, especially for bench scientists. In this chapter, we describe an entire workflow for performing RNA-Seq experiments. We describe critical aspects of wet lab experiments such as RNA isolation, library preparation and the initial design of an experiment. Further, we provide a step-by-step description of the bioinformatics workflow for different steps involved in RNA-Seq data analysis. This includes power calculations, setting up a computational environment, acquisition and processing of publicly available data if desired, quality control measures, preprocessing steps for the raw data, differential expression analysis, and data visualization. We particularly mention important considerations for each step to provide a guide for designing and analyzing RNA-Seq data.
Advances in single-cell RNA sequencing and its applications in cancer research.
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-08-08
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.
Advances in single-cell RNA sequencing and its applications in cancer research
Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming
2017-01-01
Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849
From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing.
Marinov, Georgi K; Williams, Brian A; McCue, Ken; Schroth, Gary P; Gertz, Jason; Myers, Richard M; Wold, Barbara J
2014-03-01
Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.
Tian, Yao; Smith, David Roy
2016-05-01
Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
DuanMu, Huizi; Wang, Yang; Bai, Xi; Cheng, Shufei; Deyholos, Michael K; Wong, Gane Ka-Shu; Li, Dan; Zhu, Dan; Li, Ran; Yu, Yang; Cao, Lei; Chen, Chao; Zhu, Yanming
2015-11-01
Soil alkalinity is an important environmental problem limiting agricultural productivity. Wild soybean (Glycine soja) shows strong alkaline stress tolerance, so it is an ideal plant candidate for studying the molecular mechanisms of alkaline tolerance and identifying alkaline stress-responsive genes. However, limited information is available about G. soja responses to alkaline stress on a genomic scale. Therefore, in the present study, we used RNA sequencing to compare transcript profiles of G. soja root responses to sodium bicarbonate (NaHCO3) at six time points, and a total of 68,138,478 pairs of clean reads were obtained using the Illumina GAIIX. Expression patterns of 46,404 G. soja genes were profiled in all six samples based on RNA-seq data using Cufflinks software. Then, t12 transcription factors from MYB, WRKY, NAC, bZIP, C2H2, HB, and TIFY families and 12 oxidation reduction related genes were chosen and verified to be induced in response to alkaline stress by using quantitative real-time polymerase chain reaction (qRT-PCR). The GO functional annotation analysis showed that besides "transcriptional regulation" and "oxidation reduction," these genes were involved in a variety of processes, such as "binding" and "response to stress." This is the first comprehensive transcriptome profiling analysis of wild soybean root under alkaline stress by RNA sequencing. Our results highlight changes in the gene expression patterns and identify a set of genes induced by NaHCO3 stress. These findings provide a base for the global analyses of G. soja alkaline stress tolerance mechanisms.
Wei, Hui; Fu, Yan; Magnusson, Lauren; Baker, John O.; Maness, Pin-Ching; Xu, Qi; Yang, Shihui; Bowersox, Andrew; Bogorad, Igor; Wang, Wei; Tucker, Melvin P.; Himmel, Michael E.; Ding, Shi-You
2014-01-01
The anaerobic, thermophilic bacterium, Clostridium thermocellum, secretes multi-protein enzyme complexes, termed cellulosomes, which synergistically interact with the microbial cell surface and efficiently disassemble plant cell wall biomass. C. thermocellum has also been considered a potential consolidated bioprocessing (CBP) organism due to its ability to produce the biofuel products, hydrogen, and ethanol. We found that C. thermocellum fermentation of pretreated yellow poplar (PYP) produced 30 and 39% of ethanol and hydrogen product concentrations, respectively, compared to fermentation of cellobiose. RNA-seq was used to analyze the transcriptional profiles of these cells. The PYP-grown cells taken for analysis at the late stationary phase showed 1211 genes up-regulated and 314 down-regulated by more than two-fold compared to the cellobiose-grown cells. These affected genes cover a broad spectrum of specific functional categories. The transcriptional analysis was further validated by sub-proteomics data taken from the literature; as well as by quantitative reverse transcription-PCR (qRT-PCR) analyses of selected genes. Specifically, 47 cellulosomal protein-encoding genes, genes for 4 pairs of SigI-RsgI for polysaccharide sensing, 7 cellodextrin ABC transporter genes, and a set of NAD(P)H hydogenase and alcohol dehydrogenase genes were up-regulated for cells growing on PYP compared to cellobiose. These genes could be potential candidates for future studies aimed at gaining insight into the regulatory mechanism of this organism as well as for improvement of C. thermocellum in its role as a CBP organism. PMID:24782837
A practical guide to single-cell RNA-sequencing for biomedical research and clinical applications.
Haque, Ashraful; Engel, Jessica; Teichmann, Sarah A; Lönnberg, Tapio
2017-08-18
RNA sequencing (RNA-seq) is a genomic approach for the detection and quantitative analysis of messenger RNA molecules in a biological sample and is useful for studying cellular responses. RNA-seq has fueled much discovery and innovation in medicine over recent years. For practical reasons, the technique is usually conducted on samples comprising thousands to millions of cells. However, this has hindered direct assessment of the fundamental unit of biology-the cell. Since the first single-cell RNA-sequencing (scRNA-seq) study was published in 2009, many more have been conducted, mostly by specialist laboratories with unique skills in wet-lab single-cell genomics, bioinformatics, and computation. However, with the increasing commercial availability of scRNA-seq platforms, and the rapid ongoing maturation of bioinformatics approaches, a point has been reached where any biomedical researcher or clinician can use scRNA-seq to make exciting discoveries. In this review, we present a practical guide to help researchers design their first scRNA-seq studies, including introductory information on experimental hardware, protocol choice, quality control, data analysis and biological interpretation.
Isolation of ripening-related genes from ethylene/1-MCP treated papaya through RNA-seq.
Shen, Yan Hong; Lu, Bing Guo; Feng, Li; Yang, Fei Ying; Geng, Jiao Jiao; Ming, Ray; Chen, Xiao Jing
2017-08-31
Since papaya is a typical climacteric fruit, exogenous ethylene (ETH) applications can induce premature and quicker ripening, while 1-methylcyclopropene (1-MCP) slows down the ripening processes. Differential gene expression in ETH or 1-MCP-treated papaya fruits accounts for the ripening processes. To isolate the key ripening-related genes and better understand fruit ripening mechanisms, transcriptomes of ETH or 1-MCP-treated, and non-treated (Control Group, CG) papaya fruits were sequenced using Illumina Hiseq2500. A total of 18,648 (1-MCP), 19,093 (CG), and 15,321 (ETH) genes were detected, with the genes detected in the ETH-treatment being the least. This suggests that ETH may inhibit the expression of some genes. Based on the differential gene expression (DGE) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, 53 fruit ripening-related genes were selected: 20 cell wall-related genes, 18 chlorophyll and carotenoid metabolism-related genes, four proteinases and their inhibitors, six plant hormone signal transduction pathway genes, four transcription factors, and one senescence-associated gene. Reverse transcription quantitative PCR (RT-qPCR) analyses confirmed the results of RNA-seq and verified that the expression pattern of six genes is consistent with the fruit senescence process. Based on the expression profiling of genes in carbohydrate metabolic process, chlorophyll metabolism pathway, and carotenoid metabolism pathway, the mechanism of pulp softening and coloration of papaya was deduced and discussed. We illustrate that papaya fruit softening is a complex process with significant cell wall hydrolases, such as pectinases, cellulases, and hemicellulases involved in the process. Exogenous ethylene accelerates the coloration of papaya changing from green to yellow. This is likely due to the inhibition of chlorophyll biosynthesis and the α-branch of carotenoid metabolism. Chy-b may play an important role in the yellow color of papaya fruit. Comparing the differential gene expression in ETH/1-MCP-treated papaya using RNA-seq is a sound approach to isolate ripening-related genes. The results of this study can improve our understanding of papaya fruit ripening molecular mechanism and reveal candidate fruit ripening-related genes for further research.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Radiation-induced alternative transcripts as detected in total and polysome-bound mRNA.
Wahba, Amy; Ryan, Michael C; Shankavaram, Uma T; Camphausen, Kevin; Tofilon, Philip J
2018-01-02
Alternative splicing is a critical event in the posttranscriptional regulation of gene expression. To investigate whether this process influences radiation-induced gene expression we defined the effects of ionizing radiation on the generation of alternative transcripts in total cellular mRNA (the transcriptome) and polysome-bound mRNA (the translatome) of the human glioblastoma stem-like cell line NSC11. For these studies, RNA-Seq profiles from control and irradiated cells were compared using the program SpliceSeq to identify transcripts and splice variations induced by radiation. As compared to the transcriptome (total RNA) of untreated cells, the radiation-induced transcriptome contained 92 splice events suggesting that radiation induced alternative splicing. As compared to the translatome (polysome-bound RNA) of untreated cells, the radiation-induced translatome contained 280 splice events of which only 24 were overlapping with the radiation-induced transcriptome. These results suggest that radiation not only modifies alternative splicing of precursor mRNA, but also results in the selective association of existing mRNA isoforms with polysomes. Comparison of radiation-induced alternative transcripts to radiation-induced gene expression in total RNA revealed little overlap (about 3%). In contrast, in the radiation-induced translatome, about 38% of the induced alternative transcripts corresponded to genes whose expression level was affected in the translatome. This study suggests that whereas radiation induces alternate splicing, the alternative transcripts present at the time of irradiation may play a role in the radiation-induced translational control of gene expression and thus cellular radioresponse.
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-01-01
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-07-05
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.
Genome-wide mapping of alternative splicing in Arabidopsis thaliana
Filichkin, Sergei A.; Priest, Henry D.; Givan, Scott A.; Shen, Rongkun; Bryant, Douglas W.; Fox, Samuel E.; Wong, Weng-Keen; Mockler, Todd C.
2010-01-01
Alternative splicing can enhance transcriptome plasticity and proteome diversity. In plants, alternative splicing can be manifested at different developmental stages, and is frequently associated with specific tissue types or environmental conditions such as abiotic stress. We mapped the Arabidopsis transcriptome at single-base resolution using the Illumina platform for ultrahigh-throughput RNA sequencing (RNA-seq). Deep transcriptome sequencing confirmed a majority of annotated introns and identified thousands of novel alternatively spliced mRNA isoforms. Our analysis suggests that at least ∼42% of intron-containing genes in Arabidopsis are alternatively spliced; this is significantly higher than previous estimates based on cDNA/expressed sequence tag sequencing. Random validation confirmed that novel splice isoforms empirically predicted by RNA-seq can be detected in vivo. Novel introns detected by RNA-seq were substantially enriched in nonconsensus terminal dinucleotide splice signals. Alternative isoforms with premature termination codons (PTCs) comprised the majority of alternatively spliced transcripts. Using an example of an essential circadian clock gene, we show that intron retention can generate relatively abundant PTC+ isoforms and that this specific event is highly conserved among diverse plant species. Alternatively spliced PTC+ isoforms can be potentially targeted for degradation by the nonsense mediated mRNA decay (NMD) surveillance machinery or regulate the level of functional transcripts by the mechanism of regulated unproductive splicing and translation (RUST). We demonstrate that the relative ratios of the PTC+ and reference isoforms for several key regulatory genes can be considerably shifted under abiotic stress treatments. Taken together, our results suggest that like in animals, NMD and RUST may be widespread in plants and may play important roles in regulating gene expression. PMID:19858364
Pazhamala, Lekha T; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K; Varshney, Rajeev K
2016-01-01
Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety "Asha" (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits.
Pazhamala, Lekha T.; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K.; Varshney, Rajeev K.
2016-01-01
Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety “Asha” (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits. PMID:27760186
Lin, Yang; Lewallen, Eric A.; Camilleri, Emily T.; Bonin, Carolina A.; Jones, Dakota L.; Dudakovic, Amel; Galeano-Garces, Catalina; Wang, Wei; Karperien, Marcel J.; Larson, Annalise N.; Dahm, Diane L.; Stuart, Michael J.; Levy, Bruce A.; Smith, Jay; Ryssman, Daniel B.; Westendorf, Jennifer J.; Im, Hee-Jeong; van Wijnen, Andre J.; Riester, Scott M.; Krych, Aaron J.
2016-01-01
Preservation of osteochondral allografts used for transplantation is critical to ensure favorable outcomes for patients after surgical treatment of cartilage defects. To study the biological effects of protocols currently used for cartilage storage, we investigated differences in gene expression between stored allograft cartilage and fresh cartilage from living donors using high throughput molecular screening strategies. We applied next generation RNA sequencing (RNA-seq) and real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR) to assess genome-wide differences in mRNA expression between stored allograft cartilage and fresh cartilage tissue from living donors. Gene ontology analysis was used to characterize biological pathways associated with differentially expressed genes. Our studies establish reduced levels of mRNAs encoding cartilage related extracellular matrix (ECM) proteins (i.e., COL1A1, COL2A1, COL10A1, ACAN, DCN, HAPLN1, TNC, and COMP) in stored cartilage. These changes occur concomitantly with increased expression of “early response genes” that encode transcription factors mediating stress/cytoprotective responses (i.e., EGR1, EGR2, EGR3, MYC, FOS, FOSB, FOSL1, FOSL2, JUN, JUNB, and JUND). The elevated expression of “early response genes” and reduced levels of ECM-related mRNAs in stored cartilage allografts suggests that tissue viability may be maintained by a cytoprotective program that reduces cell metabolic activity. These findings have potential implications for future studies focused on quality assessment and clinical optimization of osteochondral allografts used for cartilage transplantation. PMID:26909883
Luck, Ashley N; Slatko, Barton E; Foster, Jeremy M
2017-01-01
Efficient transcriptomic sequencing of microbial mRNA derived from host-microbe associations is often compromised by the much lower relative abundance of microbial RNA in the mixed total RNA sample. One solution to this problem is to perform extensive sequencing until an acceptable level of transcriptome coverage is obtained. More cost-effective methods include use of prokaryotic and/or eukaryotic rRNA depletion strategies, sometimes in conjunction with depletion of polyadenylated eukaryotic mRNA. Here, we report use of Cappable-seq™ to specifically enrich, in a single step, Wolbachia endobacterial mRNA transcripts from total RNA prepared from the parasitic filarial nematode, Brugia malayi. The obligate Wolbachia endosymbiont is a proven drug target for many human filarial infections, yet the precise nature of its symbiosis with the nematode host is poorly understood. Insightful analysis of the expression levels of Wolbachia genes predicted to underpin the mutualistic association and of known drug target genes at different life cycle stages or in response to drug treatments is typically challenged by low transcriptomic coverage. Cappable-seq resulted in up to ~ 5-fold increase in the number of reads mapping to Wolbachia. On average, coverage of Wolbachia transcripts from B. malayi microfilariae was enriched ~40-fold by Cappable-seq. Additionally, this method has an additional benefit of selectively removing abundant prokaryotic ribosomal RNAs.The deeper microbial transcriptome sequencing afforded by Cappable-seq facilitates more detailed characterization of gene expression levels of pathogens and symbionts present in animal tissues.
dCLIP: a computational approach for comparative CLIP-seq analyses
2014-01-01
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/. PMID:24398258
Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex.
Konermann, Silvana; Brigham, Mark D; Trevino, Alexandro E; Joung, Julia; Abudayyeh, Omar O; Barcena, Clea; Hsu, Patrick D; Habib, Naomi; Gootenberg, Jonathan S; Nishimasu, Hiroshi; Nureki, Osamu; Zhang, Feng
2015-01-29
Systematic interrogation of gene function requires the ability to perturb gene expression in a robust and generalizable manner. Here we describe structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci. We used these engineered Cas9 activation complexes to investigate single-guide RNA (sgRNA) targeting rules for effective transcriptional activation, to demonstrate multiplexed activation of ten genes simultaneously, and to upregulate long intergenic non-coding RNA (lincRNA) transcripts. We also synthesized a library consisting of 70,290 guides targeting all human RefSeq coding isoforms to screen for genes that, upon activation, confer resistance to a BRAF inhibitor. The top hits included genes previously shown to be able to confer resistance, and novel candidates were validated using individual sgRNA and complementary DNA overexpression. A gene expression signature based on the top screening hits correlated with markers of BRAF inhibitor resistance in cell lines and patient-derived samples. These results collectively demonstrate the potential of Cas9-based activators as a powerful genetic perturbation technology.
Isoform Evolution in Primates through Independent Combination of Alternative RNA Processing Events
Zhang, Shi-Jian; Wang, Chenqu; Yan, Shouyu; Fu, Aisi; Luan, Xuke; Li, Yumei; Sunny Shen, Qing; Zhong, Xiaoming; Chen, Jia-Yu; Wang, Xiangfeng; Chin-Ming Tan, Bertrand; He, Aibin; Li, Chuan-Yun
2017-01-01
Abstract Recent RNA-seq technology revealed thousands of splicing events that are under rapid evolution in primates, whereas the reliability of these events, as well as their combination on the isoform level, have not been adequately addressed due to its limited sequencing length. Here, we performed comparative transcriptome analyses in human and rhesus macaque cerebellum using single molecule long-read sequencing (Iso-seq) and matched RNA-seq. Besides 359 million RNA-seq reads, 4,165,527 Iso-seq reads were generated with a mean length of 14,875 bp, covering 11,466 human genes, and 10,159 macaque genes. With Iso-seq data, we substantially expanded the repertoire of alternative RNA processing events in primates, and found that intron retention and alternative polyadenylation are surprisingly more prevalent in primates than previously estimated. We then investigated the combinatorial mode of these alternative events at the whole-transcript level, and found that the combination of these events is largely independent along the transcript, leading to thousands of novel isoforms missed by current annotations. Notably, these novel isoforms are selectively constrained in general, and 1,119 isoforms have even higher expression than the previously annotated major isoforms in human, indicating that the complexity of the human transcriptome is still significantly underestimated. Comparative transcriptome analysis further revealed 502 genes encoding selectively constrained, lineage-specific isoforms in human but not in rhesus macaque, linking them to some lineage-specific functions. Overall, we propose that the independent combination of alternative RNA processing events has contributed to complex isoform evolution in primates, which provides a new foundation for the study of phenotypic difference among primates. PMID:28957512
Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein
2013-01-01
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
Das, Pranab J; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A Kendrick; Teague, Sheila; Love, Charles C; Varner, Dickson D; Chowdhary, Bhanu P; Raudsepp, Terje
2013-01-01
Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs.
Das, Pranab J.; McCarthy, Fiona; Vishnoi, Monika; Paria, Nandina; Gresham, Cathy; Li, Gang; Kachroo, Priyanka; Sudderth, A. Kendrick; Teague, Sheila; Love, Charles C.; Varner, Dickson D.; Chowdhary, Bhanu P.; Raudsepp, Terje
2013-01-01
Mature mammalian sperm contain a complex population of RNAs some of which might regulate spermatogenesis while others probably play a role in fertilization and early development. Due to this limited knowledge, the biological functions of sperm RNAs remain enigmatic. Here we report the first characterization of the global transcriptome of the sperm of fertile stallions. The findings improved understanding of the biological significance of sperm RNAs which in turn will allow the discovery of sperm-based biomarkers for stallion fertility. The stallion sperm transcriptome was interrogated by analyzing sperm and testes RNA on a 21,000-element equine whole-genome oligoarray and by RNA-seq. Microarray analysis revealed 6,761 transcripts in the sperm, of which 165 were sperm-enriched, and 155 were differentially expressed between the sperm and testes. Next, 70 million raw reads were generated by RNA-seq of which 50% could be aligned with the horse reference genome. A total of 19,257 sequence tags were mapped to all horse chromosomes and the mitochondrial genome. The highest density of mapped transcripts was in gene-rich ECA11, 12 and 13, and the lowest in gene-poor ECA9 and X; 7 gene transcripts originated from ECAY. Structural annotation aligned sperm transcripts with 4,504 known horse and/or human genes, rRNAs and 82 miRNAs, whereas 13,354 sequence tags remained anonymous. The data were aligned with selected equine gene models to identify additional exons and splice variants. Gene Ontology annotations showed that sperm transcripts were associated with molecular processes (chemoattractant-activated signal transduction, ion transport) and cellular components (membranes and vesicles) related to known sperm functions at fertilization, while some messenger and micro RNAs might be critical for early development. The findings suggest that the rich repertoire of coding and non-coding RNAs in stallion sperm is not a random remnant from spermatogenesis in testes but a selectively retained and functionally coherent collection of RNAs. PMID:23409192
Ryan, Michael C; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N
2012-09-15
SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. mryan@insilico.us.com Supplementary data are available at Bioinformatics online.
Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.
Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh
2018-01-01
Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.
Edsgärd, Daniel; Iglesias, Maria Jesus; Reilly, Sarah-Jayne; Hamsten, Anders; Tornvall, Per; Odeberg, Jacob; Emanuelsson, Olof
2016-01-01
Allele-specific expression (ASE) is the imbalance in transcription between maternal and paternal alleles at a locus and can be probed in single individuals using massively parallel DNA sequencing technology. Assessing ASE within a single sample provides a static picture of the ASE, but the magnitude of ASE for a given transcript may vary between different biological conditions in an individual. Such condition-dependent ASE could indicate a genetic variation with a functional role in the phenotypic difference. We investigated ASE through RNA-sequencing of primary white blood cells from eight human individuals before and after the controlled induction of an inflammatory response, and detected condition-dependent and static ASE at 211 and 13021 variants, respectively. We developed a method, GeneiASE, to detect genes exhibiting static or condition-dependent ASE in single individuals. GeneiASE performed consistently over a range of read depths and ASE effect sizes, and did not require phasing of variants to estimate haplotypes. We observed condition-dependent ASE related to the inflammatory response in 19 genes, and static ASE in 1389 genes. Allele-specific expression was confirmed by validation of variants through real-time quantitative RT-PCR, with RNA-seq and RT-PCR ASE effect-size correlations r = 0.67 and r = 0.94 for static and condition-dependent ASE, respectively. PMID:26887787
Auerbach, Scott S; Phadke, Dhiral P; Mav, Deepak; Holmgren, Stephanie; Gao, Yuan; Xie, Bin; Shin, Joo Heon; Shah, Ruchir R; Merrick, B Alex; Tice, Raymond R
2015-07-01
Formalin-fixed, paraffin-embedded (FFPE) pathology specimens represent a potentially vast resource for transcriptomic-based biomarker discovery. We present here a comparison of results from a whole transcriptome RNA-Seq analysis of RNA extracted from fresh frozen and FFPE livers. The samples were derived from rats exposed to aflatoxin B1 (AFB1 ) and a corresponding set of control animals. Principal components analysis indicated that samples were separated in the two groups representing presence or absence of chemical exposure, both in fresh frozen and FFPE sample types. Sixty-five percent of the differentially expressed transcripts (AFB1 vs. controls) in fresh frozen samples were also differentially expressed in FFPE samples (overlap significance: P < 0.0001). Genomic signature and gene set analysis of AFB1 differentially expressed transcript lists indicated highly similar results between fresh frozen and FFPE at the level of chemogenomic signatures (i.e., single chemical/dose/duration elicited transcriptomic signatures), mechanistic and pathology signatures, biological processes, canonical pathways and transcription factor networks. Overall, our results suggest that similar hypotheses about the biological mechanism of toxicity would be formulated from fresh frozen and FFPE samples. These results indicate that phenotypically anchored archival specimens represent a potentially informative resource for signature-based biomarker discovery and mechanistic characterization of toxicity. Copyright © 2014 John Wiley & Sons, Ltd.
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.
2017-01-01
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623
Wang, Jingkui; Yeung, Jake; Gobet, Cédric; Sobel, Jonathan; Lück, Sarah; Molina, Nacho; Naef, Felix
2018-01-01
The mammalian circadian clock coordinates physiology with environmental cycles through the regulation of daily oscillations of gene expression. Thousands of transcripts exhibit rhythmic accumulations across mouse tissues, as determined by the balance of their synthesis and degradation. While diurnally rhythmic transcription regulation is well studied and often thought to be the main factor generating rhythmic mRNA accumulation, the extent of rhythmic posttranscriptional regulation is debated, and the kinetic parameters (e.g., half-lives), as well as the underlying regulators (e.g., mRNA-binding proteins) are relatively unexplored. Here, we developed a quantitative model for cyclic accumulations of pre-mRNA and mRNA from total RNA-seq data, and applied it to mouse liver. This allowed us to identify that about 20% of mRNA rhythms were driven by rhythmic mRNA degradation, and another 15% of mRNAs regulated by both rhythmic transcription and mRNA degradation. The method could also estimate mRNA half-lives and processing times in intact mouse liver. We then showed that, depending on mRNA half-life, rhythmic mRNA degradation can either amplify or tune phases of mRNA rhythms. By comparing mRNA rhythms in wild-type and Bmal1−/− animals, we found that the rhythmic degradation of many transcripts did not depend on a functional BMAL1. Interestingly clock-dependent and -independent degradation rhythms peaked at distinct times of day. We further predicted mRNA-binding proteins (mRBPs) that were implicated in the posttranscriptional regulation of mRNAs, either through stabilizing or destabilizing activities. Together, our results demonstrate how posttranscriptional regulation temporally shapes rhythmic mRNA accumulation in mouse liver. PMID:29432155
Majoros, William H.; Campbell, Michael S.; Holt, Carson; DeNardo, Erin K.; Ware, Doreen; Allen, Andrew S.; Yandell, Mark; Reddy, Timothy E.
2017-01-01
Abstract Motivation: The accurate interpretation of genetic variants is critical for characterizing genotype–phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. Results: We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (‘Assessing Changes to Exons’) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. Availability and Implementation: ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE Contact: myandell@genetics.utah.edu or tim.reddy@duke.edu Supplementary information: Supplementary information is available at Bioinformatics online. PMID:28011790
Majoros, William H; Campbell, Michael S; Holt, Carson; DeNardo, Erin K; Ware, Doreen; Allen, Andrew S; Yandell, Mark; Reddy, Timothy E
2017-05-15
The accurate interpretation of genetic variants is critical for characterizing genotype-phenotype associations. Because the effects of genetic variants can depend strongly on their local genomic context, accurate genome annotations are essential. Furthermore, as some variants have the potential to disrupt or alter gene structure, variant interpretation efforts stand to gain from the use of individualized annotations that account for differences in gene structure between individuals or strains. We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE ('Assessing Changes to Exons') converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detects gene-structure changes and their possible repercussions, and identifies several classes of possible loss of function. Novel transcripts predicted by ACE are commonly supported by spliced RNA-seq reads, and can be used to improve read alignment and transcript quantification when an individual-specific genome sequence is available. Using publicly available RNA-seq data, we show that ACE predictions confirm earlier results regarding the quantitative effects of nonsense-mediated decay, and we show that predicted loss-of-function events are highly concordant with patterns of intolerance to mutations across the human population. ACE can be readily applied to diverse species including animals and plants, making it a broadly useful tool for use in eukaryotic population-based resequencing projects, particularly for assessing the joint impact of all variants at a locus. ACE is written in open-source C ++ and Perl and is available from geneprediction.org/ACE. myandell@genetics.utah.edu or tim.reddy@duke.edu. Supplementary information is available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.
2014-01-01
RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Chuang, Trees-Juen; Wu, Chan-Shuo; Chen, Chia-Ying; Hung, Li-Yuan; Chiang, Tai-Wei; Yang, Min-Yu
2016-02-18
Analysis of RNA-seq data often detects numerous 'non-co-linear' (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method ('NCLscan'), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Mandelin, Arthur M; Homan, Philip J; Shaffer, Alexander M; Cuda, Carla M; Dominguez, Salina T; Bacalao, Emily; Carns, Mary; Hinchcliff, Monique; Lee, Jungwha; Aren, Kathleen; Thakrar, Anjali; Montgomery, Anna B; Bridges, S Louis; Bathon, Joan M; Atkinson, John P; Fox, David A; Matteson, Eric L; Buckley, Christopher D; Pitzalis, Costantino; Parks, Deborah; Hughes, Laura B; Geraldino-Pardilla, Laura; Ike, Robert; Phillips, Kristine; Wright, Kerry; Filer, Andrew; Kelly, Stephen; Ruderman, Eric M; Morgan, Vince; Abdala-Valencia, Hiam; Misharin, Alexander V; Budinger, G Scott; Bartom, Elizabeth T; Pope, Richard M; Perlman, Harris; Winter, Deborah R
2018-06-01
Currently, there are no reliable biomarkers for predicting therapeutic response in patients with rheumatoid arthritis (RA). The synovium may unlock critical information for determining efficacy, since a reduction in the numbers of sublining synovial macrophages remains the most reproducible biomarker. Thus, a clinically actionable method for the collection of synovial tissue, which can be analyzed using high-throughput strategies, must become a reality. This study was undertaken to assess the feasibility of utilizing synovial biopsies as a precision medicine-based approach for patients with RA. Rheumatologists at 6 US academic sites were trained in minimally invasive ultrasound-guided synovial tissue biopsy. Biopsy specimens obtained from patients with RA and synovial tissue from patients with osteoarthritis (OA) were subjected to histologic analysis, fluorescence-activated cell sorting, and RNA sequencing (RNA-seq). An optimized protocol for digesting synovial tissue was developed to generate high-quality RNA-seq libraries from isolated macrophage populations. Associations were determined between macrophage transcriptional profiles and clinical parameters in RA patients. Patients with RA reported minimal adverse effects in response to synovial biopsy. Comparable RNA quality was observed from synovial tissue and isolated macrophages between patients with RA and patients with OA. Whole tissue samples from patients with RA demonstrated a high degree of transcriptional heterogeneity. In contrast, the transcriptional profile of isolated RA synovial macrophages highlighted different subpopulations of patients and identified 6 novel transcriptional modules that were associated with disease activity and therapy. Performance of synovial tissue biopsies by rheumatologists in the US is feasible and generates high-quality samples for research. Through the use of cutting-edge technologies to analyze synovial biopsy specimens in conjunction with corresponding clinical information, a precision medicine-based approach for patients with RA is attainable. © 2018, American College of Rheumatology.
Thomason, Maureen K; Bischler, Thorsten; Eisenbart, Sara K; Förstner, Konrad U; Zhang, Aixia; Herbig, Alexander; Nieselt, Kay; Sharma, Cynthia M; Storz, Gisela
2015-01-01
While the model organism Escherichia coli has been the subject of intense study for decades, the full complement of its RNAs is only now being examined. Here we describe a survey of the E. coli transcriptome carried out using a differential RNA sequencing (dRNA-seq) approach, which can distinguish between primary and processed transcripts, and an automated prediction algorithm for transcriptional start sites (TSS). With the criterion of expression under at least one of three growth conditions examined, we predicted 14,868 TSS candidates, including 5,574 internal to annotated genes (iTSS) and 5,495 TSS corresponding to potential antisense RNAs (asRNAs). We examined expression of 14 candidate asRNAs by Northern analysis using RNA from wild-type E. coli and from strains defective for RNases III and E, two RNases reported to be involved in asRNA processing. Interestingly, nine asRNAs detected as distinct bands by Northern analysis were differentially affected by the rnc and rne mutations. We also compared our asRNA candidates with previously published asRNA annotations from RNA-seq data and discuss the challenges associated with these cross-comparisons. Our global transcriptional start site map represents a valuable resource for identification of transcription start sites, promoters, and novel transcripts in E. coli and is easily accessible, together with the cDNA coverage plots, in an online genome browser. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Rathe, Susan K; Moriarity, Branden S; Stoltenberg, Christopher B; Kurata, Morito; Aumann, Natalie K; Rahrmann, Eric P; Bailey, Natashay J; Melrose, Ellen G; Beckmann, Dominic A; Liska, Chase R; Largaespada, David A
2014-08-13
The evolution from microarrays to transcriptome deep-sequencing (RNA-seq) and from RNA interference to gene knockouts using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) and Transcription Activator-Like Effector Nucleases (TALENs) has provided a new experimental partnership for identifying and quantifying the effects of gene changes on drug resistance. Here we describe the results from deep-sequencing of RNA derived from two cytarabine (Ara-C) resistance acute myeloid leukemia (AML) cell lines, and present CRISPR and TALEN based methods for accomplishing complete gene knockout (KO) in AML cells. We found protein modifying loss-of-function mutations in Dck in both Ara-C resistant cell lines. CRISPR and TALEN-based KO of Dck dramatically increased the IC₅₀ of Ara-C and introduction of a DCK overexpression vector into Dck KO clones resulted in a significant increase in Ara-C sensitivity. This effort demonstrates the power of using transcriptome analysis and CRISPR/TALEN-based KOs to identify and verify genes associated with drug resistance.
DRME: Count-based differential RNA methylation analysis at small sample size scenario.
Liu, Lian; Zhang, Shao-Wu; Gao, Fan; Zhang, Yixin; Huang, Yufei; Chen, Runsheng; Meng, Jia
2016-04-15
Differential methylation, which concerns difference in the degree of epigenetic regulation via methylation between two conditions, has been formulated as a beta or beta-binomial distribution to address the within-group biological variability in sequencing data. However, a beta or beta-binomial model is usually difficult to infer at small sample size scenario with discrete reads count in sequencing data. On the other hand, as an emerging research field, RNA methylation has drawn more and more attention recently, and the differential analysis of RNA methylation is significantly different from that of DNA methylation due to the impact of transcriptional regulation. We developed DRME to better address the differential RNA methylation problem. The proposed model can effectively describe within-group biological variability at small sample size scenario and handles the impact of transcriptional regulation on RNA methylation. We tested the newly developed DRME algorithm on simulated and 4 MeRIP-Seq case-control studies and compared it with Fisher's exact test. It is in principle widely applicable to several other RNA-related data types as well, including RNA Bisulfite sequencing and PAR-CLIP. The code together with an MeRIP-Seq dataset is available online (https://github.com/lzcyzm/DRME) for evaluation and reproduction of the figures shown in this article. Copyright © 2016 Elsevier Inc. All rights reserved.
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Taoka, Masato; Nobe, Yuko; Hori, Masayuki; Takeuchi, Aiko; Masaki, Shunpei; Yamauchi, Yoshio; Nakayama, Hiroshi; Takahashi, Nobuhiro; Isobe, Toshiaki
2015-01-01
We present a liquid chromatography–mass spectrometry (LC-MS)-based method for comprehensive quantitative identification of post-transcriptional modifications (PTMs) of RNA. We incorporated an in vitro-transcribed, heavy isotope-labeled reference RNA into a sample RNA solution, digested the mixture with a number of RNases and detected the post-transcriptionally modified oligonucleotides quantitatively based on shifts in retention time and the MS signal in subsequent LC-MS. This allowed the determination and quantitation of all PTMs in Schizosaccharomyces pombe ribosomal (r)RNAs and generated the first complete PTM maps of eukaryotic rRNAs at single-nucleotide resolution. There were 122 modified sites, most of which appear to locate at the interface of ribosomal subunits where translation takes place. We also identified PTMs at specific locations in rRNAs that were altered in response to growth conditions of yeast cells, suggesting that the cells coordinately regulate the modification levels of RNA. PMID:26013808
Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh
2018-06-03
Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.
Tubulin C-terminal Post-translational Modifications Do Not Occur in Wood Forming Tissue of Populus
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hu, Hao; Gu, Xi; Xue, Liang-Jiao
Cortical microtubules (MTs) are evolutionarily conserved cytoskeletal components with specialized roles in plants, including regulation of cell wall biogenesis. MT functions and dynamics are dictated by the composition of their monomeric subunits, α- (TUA) and β-tubulins (TUB), which in animals and protists are subject to both transcriptional regulation and post-translational modifications (PTM). While spatiotemporal regulation of tubulin gene expression has been reported in plants, whether and to what extent tubulin PTMs occur in these species remain poorly understood. We chose the woody perennial Populus for investigation of tubulin PTMs in this study, with a particular focus on developing xylem wheremore » high tubulin transcript levels support MT-dependent secondary cell wall deposition. Mass spectrometry and immunodetection concurred that detyrosination, non-tyrosination and glutamylation were essentially absent in tubulins isolated from wood-forming tissues of P. deltoides and P. tremula ×alba. Label-free quantification of tubulin isotypes and RNA-Seq estimation of tubulin transcript abundance were largely consistent with transcriptional regulation. However, two TUB isotypes were detected at noticeably lower levels than expected based on RNA-Seq transcript abundance in both Populus species. These findings led us to conclude that MT composition during wood formation depends exclusively on transcriptional and, to a lesser extent, translational regulation of tubulin isotypes.« less
Tubulin C-terminal Post-translational Modifications Do Not Occur in Wood Forming Tissue of Populus
Hu, Hao; Gu, Xi; Xue, Liang-Jiao; ...
2016-10-13
Cortical microtubules (MTs) are evolutionarily conserved cytoskeletal components with specialized roles in plants, including regulation of cell wall biogenesis. MT functions and dynamics are dictated by the composition of their monomeric subunits, α- (TUA) and β-tubulins (TUB), which in animals and protists are subject to both transcriptional regulation and post-translational modifications (PTM). While spatiotemporal regulation of tubulin gene expression has been reported in plants, whether and to what extent tubulin PTMs occur in these species remain poorly understood. We chose the woody perennial Populus for investigation of tubulin PTMs in this study, with a particular focus on developing xylem wheremore » high tubulin transcript levels support MT-dependent secondary cell wall deposition. Mass spectrometry and immunodetection concurred that detyrosination, non-tyrosination and glutamylation were essentially absent in tubulins isolated from wood-forming tissues of P. deltoides and P. tremula ×alba. Label-free quantification of tubulin isotypes and RNA-Seq estimation of tubulin transcript abundance were largely consistent with transcriptional regulation. However, two TUB isotypes were detected at noticeably lower levels than expected based on RNA-Seq transcript abundance in both Populus species. These findings led us to conclude that MT composition during wood formation depends exclusively on transcriptional and, to a lesser extent, translational regulation of tubulin isotypes.« less
Systems-level identification of PKA-dependent signaling in epithelial cells.
Isobe, Kiyoshi; Jung, Hyun Jun; Yang, Chin-Rang; Claxton, J'Neka; Sandoval, Pablo; Burg, Maurice B; Raghuram, Viswanathan; Knepper, Mark A
2017-10-17
G protein stimulatory α-subunit (G αs )-coupled heptahelical receptors regulate cell processes largely through activation of protein kinase A (PKA). To identify signaling processes downstream of PKA, we deleted both PKA catalytic subunits using CRISPR-Cas9, followed by a "multiomic" analysis in mouse kidney epithelial cells expressing the G αs -coupled V2 vasopressin receptor. RNA-seq (sequencing)-based transcriptomics and SILAC (stable isotope labeling of amino acids in cell culture)-based quantitative proteomics revealed a complete loss of expression of the water-channel gene Aqp2 in PKA knockout cells. SILAC-based quantitative phosphoproteomics identified 229 PKA phosphorylation sites. Most of these PKA targets are thus far unannotated in public databases. Surprisingly, 1,915 phosphorylation sites with the motif x-(S/T)-P showed increased phosphooccupancy, pointing to increased activity of one or more MAP kinases in PKA knockout cells. Indeed, phosphorylation changes associated with activation of ERK2 were seen in PKA knockout cells. The ERK2 site is downstream of a direct PKA site in the Rap1GAP, Sipa1l1, that indirectly inhibits Raf1. In addition, a direct PKA site that inhibits the MAP kinase kinase kinase Map3k5 (ASK1) is upstream of JNK1 activation. The datasets were integrated to identify a causal network describing PKA signaling that explains vasopressin-mediated regulation of membrane trafficking and gene transcription. The model predicts that, through PKA activation, vasopressin stimulates AQP2 exocytosis by inhibiting MAP kinase signaling. The model also predicts that, through PKA activation, vasopressin stimulates Aqp2 transcription through induction of nuclear translocation of the acetyltransferase EP300, which increases histone H3K27 acetylation of vasopressin-responsive genes (confirmed by ChIP-seq).
DIMM-SC: a Dirichlet mixture model for clustering droplet-based single cell transcriptomic data.
Sun, Zhe; Wang, Ting; Deng, Ke; Wang, Xiao-Feng; Lafyatis, Robert; Ding, Ying; Hu, Ming; Chen, Wei
2018-01-01
Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods. DIMM-SC has been implemented in a user-friendly R package with a detailed tutorial available on www.pitt.edu/∼wec47/singlecell.html. wei.chen@chp.edu or hum@ccf.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Ryan, Michael C.; Cleland, James; Kim, RyangGuk; Wong, Wing Chung; Weinstein, John N.
2012-01-01
Summary: SpliceSeq is a resource for RNA-Seq data that provides a clear view of alternative splicing and identifies potential functional changes that result from splice variation. It displays intuitive visualizations and prioritized lists of results that highlight splicing events and their biological consequences. SpliceSeq unambiguously aligns reads to gene splice graphs, facilitating accurate analysis of large, complex transcript variants that cannot be adequately represented in other formats. Availability and implementation: SpliceSeq is freely available at http://bioinformatics.mdanderson.org/main/SpliceSeq:Overview. The application is a Java program that can be launched via a browser or installed locally. Local installation requires MySQL and Bowtie. Contact: mryan@insilico.us.com Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22820202
Pervasive Targeting of Nascent Transcripts by Hfq.
Kambara, Tracy K; Ramsey, Kathryn M; Dove, Simon L
2018-05-01
Hfq is an RNA chaperone and an important post-transcriptional regulator in bacteria. Using chromatin immunoprecipitation coupled with high-throughput DNA sequencing (ChIP-seq), we show that Hfq associates with hundreds of different regions of the Pseudomonas aeruginosa chromosome. These associations are abolished when transcription is inhibited, indicating that they reflect Hfq binding to transcripts during their synthesis. Analogous ChIP-seq analyses with the post-transcriptional regulator Crc reveal that it associates with many of the same nascent transcripts as Hfq, an activity we show is Hfq dependent. Our findings indicate that Hfq binds many transcripts co-transcriptionally in P. aeruginosa, often in concert with Crc, and uncover direct regulatory targets of these proteins. They also highlight a general approach for studying the interactions of RNA-binding proteins with nascent transcripts in bacteria. The binding of post-transcriptional regulators to nascent mRNAs may represent a prevalent means of controlling translation in bacteria where transcription and translation are coupled. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E
2015-01-01
Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.
Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons
Krishnaswami, Suguna Rani; Grindberg, Rashel V; Novotny, Mark; Venepally, Pratap; Lacar, Benjamin; Bhutani, Kunal; Linker, Sara B; Pham, Son; Erwin, Jennifer A; Miller, Jeremy A; Hodge, Rebecca; McCarthy, James K; Kelder, Martin; McCorrison, Jamison; Aevermann, Brian D; Fuertes, Francisco Diez; Scheuermann, Richard H; Lee, Jun; Lein, Ed S; Schork, Nicholas; McConnell, Michael J; Gage, Fred H; Lasken, Roger S
2016-01-01
A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at −80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing. PMID:26890679
Chavan, Shweta S; Bauer, Michael A; Peterson, Erich A; Heuck, Christoph J; Johann, Donald J
2013-01-01
Transcriptome analysis by microarrays has produced important advances in biomedicine. For instance in multiple myeloma (MM), microarray approaches led to the development of an effective disease subtyping via cluster assignment, and a 70 gene risk score. Both enabled an improved molecular understanding of MM, and have provided prognostic information for the purposes of clinical management. Many researchers are now transitioning to Next Generation Sequencing (NGS) approaches and RNA-seq in particular, due to its discovery-based nature, improved sensitivity, and dynamic range. Additionally, RNA-seq allows for the analysis of gene isoforms, splice variants, and novel gene fusions. Given the voluminous amounts of historical microarray data, there is now a need to associate and integrate microarray and RNA-seq data via advanced bioinformatic approaches. Custom software was developed following a model-view-controller (MVC) approach to integrate Affymetrix probe set-IDs, and gene annotation information from a variety of sources. The tool/approach employs an assortment of strategies to integrate, cross reference, and associate microarray and RNA-seq datasets. Output from a variety of transcriptome reconstruction and quantitation tools (e.g., Cufflinks) can be directly integrated, and/or associated with Affymetrix probe set data, as well as necessary gene identifiers and/or symbols from a diversity of sources. Strategies are employed to maximize the annotation and cross referencing process. Custom gene sets (e.g., MM 70 risk score (GEP-70)) can be specified, and the tool can be directly assimilated into an RNA-seq pipeline. A novel bioinformatic approach to aid in the facilitation of both annotation and association of historic microarray data, in conjunction with richer RNA-seq data, is now assisting with the study of MM cancer biology.
SPAR: small RNA-seq portal for analysis of sequencing experiments.
Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee
2018-05-04
The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.
Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis
2016-08-24
To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.
Notaguchi, Michitaka; Higashiyama, Tetsuya; Suzuki, Takamasa
2015-02-01
Phloem is a conductive tissue that allocates nutrients from mature source leaves to sinks such as young developing tissues. Phloem also delivers proteins and RNA species, such as small RNAs and mRNAs. Intensive studies on plant systemic signaling revealed the essential roles of proteins and RNA species. However, many of their functions are still largely unknown, with the roles of transported mRNAs being particularly poorly understood. A major difficulty is the absence of an accurate and comprehensive list of mobile transcripts. In this study, we used a hetero-graft system with Nicotiana benthamiana as the recipient scion and Arabidopsis as the donor stock, to identify transcripts that moved long distances across the graft union. We identified 138 Arabidopsis transcripts as mobile mRNAs, which we collectively termed the mRNA mobilome. Reverse transcription-PCR, quantitative real-time PCR and droplet digital PCR analyses confirmed the mobility. The transcripts included potential signaling factors and, unexpectedly, more general factors. In our investigations, we found no preferred transcript length, no previously known sequence motifs in promoter or transcript sequences and no similarities between the level of the transcripts and that in the source leaves. Grafting experiments regarding the function of ERECTA, an identified transcript, showed that no function of the transcript mobilized. To our knowledge, this is the first report identifying transcripts that move over long distances using a hetero-graft system between different plant taxa. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Miao, Xiangyang; Luo, Qingmiao; Qin, Xiaoyu
2016-05-10
The goats are widely kept as livestock throughout the world. Two excellent domestic breeds in China, the Laiwu Black and Jining Grey goats, have different fecundities and prolificacies. Although the goat genome sequences have been resolved recently, little is known about the gene regulations at the transcriptional level in goat. To understand the molecular and genetic mechanisms related to the fecundities and prolificacies, we performed genome-wide sequencing of the mRNAs from two breeds of goat using the next-generation RNA-Seq technology and used functional annotation to identify pathways of interest. Digital gene expression analysis showed 338 genes were up-regulated in the Jining Grey goats and 404 were up-regulated in the Laiwu Black goats. Quantitative real-time PCR verified the reliability of the RNA-Seq data. This study suggests that multiple genes responsible for various biological functions and signaling pathways are differentially expressed in the two different goat breeds, and these genes might be involved in the regulation of goat fecundity and prolificacy. Taken together, our study provides insight into the transcriptional regulation in the ovaries of 2 species of goats that might serve as a key resource for understanding goat fecundity, prolificacy and genetic diversity between species. Copyright © 2016 Elsevier B.V. All rights reserved.
Vukmirovic, Milica; Herazo-Maya, Jose D; Blackmon, John; Skodric-Trifunovic, Vesna; Jovanovic, Dragana; Pavlovic, Sonja; Stojsic, Jelena; Zeljkovic, Vesna; Yan, Xiting; Homer, Robert; Stefanovic, Branko; Kaminski, Naftali
2017-01-12
Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues. We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four. Our results demonstrate that RNA sequencing of RNA obtained from archived FFPE lung tissues is feasible. The results obtained from FFPE tissue are highly comparable to FF tissues. The ability to perform RNA-Seq on archived FFPE IPF tissues should greatly enhance the availability of tissue biopsies for research in IPF.
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.
A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages
Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles
2014-01-01
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058
A technical assessment of the porcine ejaculated spermatozoa for a sperm-specific RNA-seq analysis.
Gòdia, Marta; Mayer, Fabiana Quoos; Nafissi, Julieta; Castelló, Anna; Rodríguez-Gil, Joan Enric; Sánchez, Armand; Clop, Alex
2018-04-26
The study of the boar sperm transcriptome by RNA-seq can provide relevant information on sperm quality and fertility and might contribute to animal breeding strategies. However, the analysis of the spermatozoa RNA is challenging as these cells harbor very low amounts of highly fragmented RNA, and the ejaculates also contain other cell types with larger amounts of non-fragmented RNA. Here, we describe a strategy for a successful boar sperm purification, RNA extraction and RNA-seq library preparation. Using these approaches our objectives were: (i) to evaluate the sperm recovery rate (SRR) after boar spermatozoa purification by density centrifugation using the non-porcine-specific commercial reagent BoviPure TM ; (ii) to assess the correlation between SRR and sperm quality characteristics; (iii) to evaluate the relationship between sperm cell RNA load and sperm quality traits and (iv) to compare different library preparation kits for both total RNA-seq (SMARTer Universal Low Input RNA and TruSeq RNA Library Prep kit) and small RNA-seq (NEBNext Small RNA and TailorMix miRNA Sample Prep v2) for high-throughput sequencing. Our results show that pig SRR (~22%) is lower than in other mammalian species and that it is not significantly dependent of the sperm quality parameters analyzed in our study. Moreover, no relationship between the RNA yield per sperm cell and sperm phenotypes was found. We compared a RNA-seq library preparation kit optimized for low amounts of fragmented RNA with a standard kit designed for high amount and quality of input RNA and found that for sperm, a protocol designed to work on low-quality RNA is essential. We also compared two small RNA-seq kits and did not find substantial differences in their performance. We propose the methodological workflow described for the RNA-seq screening of the boar spermatozoa transcriptome. FPKM: fragments per kilobase of transcript per million mapped reads; KRT1: keratin 1; miRNA: micro-RNA; miscRNA: miscellaneous RNA; Mt rRNA: mitochondrial ribosomal RNA; Mt tRNA: mitochondrial transference RNA; OAZ3: ornithine decarboxylase antizyme 3; ORT: osmotic resistance test; piRNA: Piwi-interacting RNA; PRM1: protamine 1; PTPRC: protein tyrosine phosphatase receptor type C; rRNA: ribosomal RNA; snoRNA: small nucleolar RNA; snRNA: small nuclear RNA; SRR: sperm recovery rate; tRNA: transfer RNA.
The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.
Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis
2017-07-15
The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami
2018-01-19
Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .
Filichkin, Sergei A.; Hamilton, Michael; Dharmawardhana, Palitha D.; Singh, Sunil K.; Sullivan, Christopher; Ben-Hur, Asa; Reddy, Anireddy S. N.; Jaiswal, Pankaj
2018-01-01
Abiotic stresses affect plant physiology, development, growth, and alter pre-mRNA splicing. Western poplar is a model woody tree and a potential bioenergy feedstock. To investigate the extent of stress-regulated alternative splicing (AS), we conducted an in-depth survey of leaf, root, and stem xylem transcriptomes under drought, salt, or temperature stress. Analysis of approximately one billion of genome-aligned RNA-Seq reads from tissue- or stress-specific libraries revealed over fifteen millions of novel splice junctions. Transcript models supported by both RNA-Seq and single molecule isoform sequencing (Iso-Seq) data revealed a broad array of novel stress- and/or tissue-specific isoforms. Analysis of Iso-Seq data also resulted in the discovery of 15,087 novel transcribed regions of which 164 show AS. Our findings demonstrate that abiotic stresses profoundly perturb transcript isoform profiles and trigger widespread intron retention (IR) events. Stress treatments often increased or decreased retention of specific introns – a phenomenon described here as differential intron retention (DIR). Many differentially retained introns were regulated in a stress- and/or tissue-specific manner. A subset of transcripts harboring super stress-responsive DIR events showed persisting fluctuations in the degree of IR across all treatments and tissue types. To investigate coordinated dynamics of intron-containing transcripts in the study we quantified absolute copy number of isoforms of two conserved transcription factors (TFs) using Droplet Digital PCR. This case study suggests that stress treatments can be associated with coordinated switches in relative ratios between fully spliced and intron-retaining isoforms and may play a role in adjusting transcriptome to abiotic stresses. PMID:29483921
Lott, Steffen C; Wolfien, Markus; Riege, Konstantin; Bagnacani, Andrea; Wolkenhauer, Olaf; Hoffmann, Steve; Hess, Wolfgang R
2017-11-10
RNA-Sequencing (RNA-Seq) has become a widely used approach to study quantitative and qualitative aspects of transcriptome data. The variety of RNA-Seq protocols, experimental study designs and the characteristic properties of the organisms under investigation greatly affect downstream and comparative analyses. In this review, we aim to explain the impact of structured pre-selection, classification and integration of best-performing tools within modularized data analysis workflows and ready-to-use computing infrastructures towards experimental data analyses. We highlight examples for workflows and use cases that are presented for pro-, eukaryotic and mixed dual RNA-Seq (meta-transcriptomics) experiments. In addition, we are summarizing the expertise of the laboratories participating in the project consortium "Structured Analysis and Integration of RNA-Seq experiments" (de.STAIR) and its integration with the Galaxy-workbench of the RNA Bioinformatics Center (RBC). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Parallel factor ChIP provides essential internal control for quantitative differential ChIP-seq.
Guertin, Michael J; Cullen, Amy E; Markowetz, Florian; Holding, Andrew N
2018-04-17
A key challenge in quantitative ChIP combined with high-throughput sequencing (ChIP-seq) is the normalization of data in the presence of genome-wide changes in occupancy. Analysis-based normalization methods were developed for transcriptomic data and these are dependent on the underlying assumption that total transcription does not change between conditions. For genome-wide changes in transcription factor (TF) binding, these assumptions do not hold true. The challenges in normalization are confounded by experimental variability during sample preparation, processing and recovery. We present a novel normalization strategy utilizing an internal standard of unchanged peaks for reference. Our method can be readily applied to monitor genome-wide changes by ChIP-seq that are otherwise lost or misrepresented through analytical normalization. We compare our approach to normalization by total read depth and two alternative methods that utilize external experimental controls to study TF binding. We successfully resolve the key challenges in quantitative ChIP-seq analysis and demonstrate its application by monitoring the loss of Estrogen Receptor-alpha (ER) binding upon fulvestrant treatment, ER binding in response to estrodiol, ER mediated change in H4K12 acetylation and profiling ER binding in patient-derived xenographs. This is supported by an adaptable pipeline to normalize and quantify differential TF binding genome-wide and generate metrics for differential binding at individual sites.
Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes
An, Dong; Li, Changsheng; Humbeck, Klaus
2018-01-01
Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292
Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H
2017-01-09
The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Farkas, Kata; Harrison, Christian; Jones, David L.; McCarthy, Alan J.
2018-01-01
ABSTRACT Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere’s viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management. PMID:29795788
Adriaenssens, Evelien M; Farkas, Kata; Harrison, Christian; Jones, David L; Allison, Heather E; McCarthy, Alan J
2018-01-01
Detection of viruses in the environment is heavily dependent on PCR-based approaches that require reference sequences for primer design. While this strategy can accurately detect known viruses, it will not find novel genotypes or emerging and invasive viral species. In this study, we investigated the use of viromics, i.e., high-throughput sequencing of the biosphere's viral fraction, to detect human-/animal-pathogenic RNA viruses in the Conwy river catchment area in Wales, United Kingdom. Using a combination of filtering and nuclease treatment, we extracted the viral fraction from wastewater and estuarine river water and sediment, followed by high-throughput RNA sequencing (RNA-Seq) analysis on the Illumina HiSeq platform, for the discovery of RNA virus genomes. We found a higher richness of RNA viruses in wastewater samples than in river water and sediment, and we assembled a complete norovirus genotype GI.2 genome from wastewater effluent, which was not contemporaneously detected by conventional reverse transcription-quantitative PCR (qRT-PCR). The simultaneous presence of diverse rotavirus signatures in wastewater indicated the potential for zoonotic infections in the area and suggested runoff from pig farms as a possible origin of these viruses. Our results show that viromics can be an important tool in the discovery of pathogenic viruses in the environment and can be used to inform and optimize reference-based detection methods provided appropriate and rigorous controls are included. IMPORTANCE Enteric viruses cause gastrointestinal illness and are commonly transmitted through the fecal-oral route. When wastewater is released into river systems, these viruses can contaminate the environment. Our results show that we can use viromics to find the range of potentially pathogenic viruses that are present in the environment and identify prevalent genotypes. The ultimate goal is to trace the fate of these pathogenic viruses from origin to the point where they are a threat to human health, informing reference-based detection methods and water quality management.
Tuerk, Andreas; Wiktorin, Gregor; Güler, Serhat
2017-05-01
Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. "mixquare"), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation.
Savage, Sara R.; McCollum, Gary W.; Yang, Rong
2015-01-01
Purpose The peroxisome proliferator-activated receptor beta/delta (PPARβ/δ) is a transcription factor with roles in metabolism, angiogenesis, and inflammation. It has yet undefined roles in retinal inflammation and diabetic retinopathy (DR). We used RNA-seq to better understand the role of the antagonist and inverse agonist of PPARβ/δ, GSK0660, in TNFα-induced inflammation. Understanding the underlying mechanisms of vascular inflammation could lead to new treatments for DR. Methods RNA was isolated from human retinal microvascular endothelial cells treated with a vehicle, TNFα, or TNFα plus GSK0660. RNA-seq was performed with a 50 bp single read protocol. The differential expression was determined using edgeR and gene ontology, and a pathway analysis was performed using DAVID. RNA-seq validation was performed using qRT-PCR using the primers for ANGPTL4, CCL8, NOV, CXCL10, and PDPK1. Results TNFα differentially regulated 1,830 transcripts, many of which are involved in the cytokine–cytokine receptor interaction, chemokine signaling, and inflammatory response. Additionally, TNFα highly upregulated genes involved in leukocyte recruitment, including CCL5, CX3CL1, and CXCL10. GSK0660 differentially regulated 273 transcripts in TNFα-treated cells compared to TNFα alone. A pathway analysis revealed the enrichment of cytokine–cytokine receptor signaling. In particular, GSK0660 blocks the TNFα-induced upregulation of CCL8, a chemokine involved in leukocyte recruitment. Conclusions TNFα regulates several genes related to retinal leukostasis in retinal endothelial cells. GSK0660 blocks the effect of TNFα on the expressions of cytokines involved in leukocyte recruitment, including CCL8, CCL17, and CXCL10 and it may therefore block TNFα-induced retinal leukostasis. PMID:26015769
Bequette, Carlton J.; Fu, Zheng Qing; Loraine, Ann E.
2016-01-01
AINTEGUMENTA (ANT) and AINTEGUMENTA-LIKE6 (AIL6) are two related transcription factors in Arabidopsis (Arabidopsis thaliana) that have partially overlapping roles in several aspects of flower development, including floral organ initiation, identity specification, growth, and patterning. To better understand the biological processes regulated by these two transcription factors, we performed RNA sequencing (RNA-Seq) on ant ail6 double mutants. We identified thousands of genes that are differentially expressed in the double mutant compared with the wild type. Analyses of these genes suggest that ANT and AIL6 regulate floral organ initiation and growth through modifications to the cell wall polysaccharide pectin. We found reduced levels of demethylesterified homogalacturonan and altered patterns of auxin accumulation in early stages of ant ail6 flower development. The RNA-Seq experiment also revealed cross-regulation of AIL gene expression at the transcriptional level. The presence of a number of overrepresented Gene Ontology terms related to plant defense in the set of genes differentially expressed in ant ail6 suggest that ANT and AIL6 also regulate plant defense pathways. Furthermore, we found that ant ail6 plants have elevated levels of two defense hormones: salicylic acid and jasmonic acid, and show increased resistance to the bacterial pathogen Pseudomonas syringae. These results suggest that ANT and AIL6 regulate biological pathways that are critical for both development and defense. PMID:27208279
Stranded Whole Transcriptome RNA-Seq for All RNA Types
Yan, Pearlly X.; Fang, Fang; Buechlein, Aaron; Ford, James B.; Tang, Haixu; Huang, Tim H.; Burow, Matthew E.; Liu, Yunlong; Rusch, Douglas B.
2015-01-01
Stranded whole transcriptome RNA-Seq described in this unit captures quantitative expression data for all types of RNA including, but not limited to miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (large non-coding intergenic RNA), SRP RNA (signal recognition particle RNA), tRNA (transfer RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA). The size and nature of these types of RNA are irrelevant to the approach described here. Barcoded libraries for multiplexing on the Illumina platform are generated with this approach but it can be applied to other platforms with a few modifications. PMID:25599667
Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis
Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha; ...
2015-03-16
Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less
Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha
Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less
Aguilar, Carlos A.; Shcherbina, Anna; Ricke, Darrell O.; Pop, Ramona; Carrigan, Christopher T.; Gifford, Casey A.; Urso, Maria L.; Kottke, Melissa A.; Meissner, Alexander
2015-01-01
Traumatic lower-limb musculoskeletal injuries are pervasive amongst athletes and the military and typically an individual returns to activity prior to fully healing, increasing a predisposition for additional injuries and chronic pain. Monitoring healing progression after a musculoskeletal injury typically involves different types of imaging but these approaches suffer from several disadvantages. Isolating and profiling transcripts from the injured site would abrogate these shortcomings and provide enumerative insights into the regenerative potential of an individual’s muscle after injury. In this study, a traumatic injury was administered to a mouse model and healing progression was examined from 3 hours to 1 month using high-throughput RNA-Sequencing (RNA-Seq). Comprehensive dissection of the genome-wide datasets revealed the injured site to be a dynamic, heterogeneous environment composed of multiple cell types and thousands of genes undergoing significant expression changes in highly regulated networks. Four independent approaches were used to determine the set of genes, isoforms, and genetic pathways most characteristic of different time points post-injury and two novel approaches were developed to classify injured tissues at different time points. These results highlight the possibility to quantitatively track healing progression in situ via transcript profiling using high- throughput sequencing. PMID:26381351
Li, Jinming
2017-01-01
Cyclin D1 is a critical regulator of cell cycle progression and works at the G1 to S-phase transition. Here, we report the isolation and characterization of the novel c-Myc-regulated lncRNA LAST (LncRNA-Assisted Stabilization of Transcripts), which acts as a CCND1 mRNA stabilizer. Mechanistically, LAST was shown to cooperate with CNBP to bind to the 5′UTR of CCND1 mRNA to protect against possible nuclease targeting. In addition, data from CNBP RIP-seq and LAST RNA-seq showed that CCND1 mRNA might not be the only target of LAST and CNBP; three additional mRNAs were shown to be post-transcriptional targets of LAST and CNBP. In a xenograft model, depletion of LAST diminished and ectopic expression of LAST induced tumor formation, which are suggestive of its oncogenic function. We thus report a previously unknown lncRNA involved in the fine-tuned regulation of CCND1 mRNA stability, without which CCND1 exhibits, at most, partial expression. PMID:29199958
Tarallo, Roberta; Giurato, Giorgio; Bruno, Giuseppina; Ravo, Maria; Rizzo, Francesca; Salvati, Annamaria; Ricciardi, Luca; Marchese, Giovanna; Cordella, Angela; Rocco, Teresa; Gigantino, Valerio; Pierri, Biancamaria; Cimmino, Giovanni; Milanesi, Luciano; Ambrosino, Concetta; Nyman, Tuula A; Nassa, Giovanni; Weisz, Alessandro
2017-10-06
The RNA-binding protein Argonaute 2 (AGO2) is a key effector of RNA-silencing pathways It exerts a pivotal role in microRNA maturation and activity and can modulate chromatin remodeling, transcriptional gene regulation and RNA splicing. Estrogen receptor beta (ERβ) is endowed with oncosuppressive activities, antagonizing hormone-induced carcinogenesis and inhibiting growth and oncogenic functions in luminal-like breast cancers (BCs), where its expression correlates with a better prognosis of the disease. Applying interaction proteomics coupled to mass spectrometry to characterize nuclear factors cooperating with ERβ in gene regulation, we identify AGO2 as a novel partner of ERβ in human BC cells. ERβ-AGO2 association was confirmed in vitro and in vivo in both the nucleus and cytoplasm and is shown to be RNA-mediated. ChIP-Seq demonstrates AGO2 association with a large number of ERβ binding sites, and total and nascent RNA-Seq in ERβ + vs ERβ - cells, and before and after AGO2 knock-down in ERβ + cells, reveals a widespread involvement of this factor in ERβ-mediated regulation of gene transcription rate and RNA splicing. Moreover, isolation and sequencing by RIP-Seq of ERβ-associated long and small RNAs in the cytoplasm suggests involvement of the nuclear receptor in RISC loading, indicating that it may also be able to directly control mRNA translation efficiency and stability. These results demonstrate that AGO2 can act as a pleiotropic functional partner of ERβ, indicating that both factors are endowed with multiple roles in the control of key cellular functions.
Spinelli, Roberta; Pirola, Alessandra; Redaelli, Sara; Sharma, Nitesh; Raman, Hima; Valletta, Simona; Magistroni, Vera; Piazza, Rocco; Gambacorti-Passerini, Carlo
2013-11-01
Point mutations in intronic regions near mRNA splice junctions can affect the splicing process. To identify novel splicing variants from exome sequencing data, we developed a bioinformatics splice-site prediction procedure to analyze next-generation sequencing (NGS) data (SpliceFinder). SpliceFinder integrates two functional annotation tools for NGS, ANNOVAR and MutationTaster and two canonical splice site prediction programs for single mutation analysis, SSPNN and NetGene2. By SpliceFinder, we identified somatic mutations affecting RNA splicing in a colon cancer sample, in eight atypical chronic myeloid leukemia (aCML), and eight CML patients. A novel homozygous splicing mutation was found in APC (NM_000038.4:c.1312+5G>A) and six heterozygous in GNAQ (NM_002072.2:c.735+1C>T), ABCC 3 (NM_003786.3:c.1783-1G>A), KLHDC 1 (NM_172193.1:c.568-2A>G), HOOK 1 (NM_015888.4:c.1662-1G>A), SMAD 9 (NM_001127217.2:c.1004-1C>T), and DNAH 9 (NM_001372.3:c.10242+5G>A). Integrating whole-exome and RNA sequencing in aCML and CML, we assessed the phenotypic effect of mutations on mRNA splicing for GNAQ, ABCC 3, HOOK 1. In ABCC 3 and HOOK 1, RNA-Seq showed the presence of aberrant transcripts with activation of a cryptic splice site or intron retention, validated by the reverse transcription-polymerase chain reaction (RT-PCR) in the case of HOOK 1. In GNAQ, RNA-Seq showed 22% of wild-type transcript and 78% of mRNA skipping exon 5, resulting in a 4-6 frameshift fusion confirmed by RT-PCR. The pipeline can be useful to identify intronic variants affecting RNA sequence by complementing conventional exome analysis.
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis
Ji, Zhicheng; Ji, Hongkai
2016-01-01
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. PMID:27179027
TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.
Ji, Zhicheng; Ji, Hongkai
2016-07-27
When analyzing single-cell RNA-seq data, constructing a pseudo-temporal path to order cells based on the gradual transition of their transcriptomes is a useful way to study gene expression dynamics in a heterogeneous cell population. Currently, a limited number of computational tools are available for this task, and quantitative methods for comparing different tools are lacking. Tools for Single Cell Analysis (TSCAN) is a software tool developed to better support in silico pseudo-Time reconstruction in Single-Cell RNA-seq ANalysis. TSCAN uses a cluster-based minimum spanning tree (MST) approach to order cells. Cells are first grouped into clusters and an MST is then constructed to connect cluster centers. Pseudo-time is obtained by projecting each cell onto the tree, and the ordered sequence of cells can be used to study dynamic changes of gene expression along the pseudo-time. Clustering cells before MST construction reduces the complexity of the tree space. This often leads to improved cell ordering. It also allows users to conveniently adjust the ordering based on prior knowledge. TSCAN has a graphical user interface (GUI) to support data visualization and user interaction. Furthermore, quantitative measures are developed to objectively evaluate and compare different pseudo-time reconstruction methods. TSCAN is available at https://github.com/zji90/TSCAN and as a Bioconductor package. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.
Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida
2014-09-15
Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.
Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu
2017-01-04
The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.
Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M
2016-01-01
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.
2017-01-01
Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence–function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence–function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the −35 RNA polymerase (RNAP) binding site and one overlapping with the −10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli. PMID:28463494
Beta-Poisson model for single-cell RNA-seq data analyses.
Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Rantalainen, Mattias; Pawitan, Yudi
2016-07-15
Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Hah, Nasun; Danko, Charles G.; Core, Leighton; Waterfall, Joshua J.; Siepel, Adam; Lis, John T.; Kraus, W. Lee
2011-01-01
Summary We report the immediate effects of estrogen signaling on the transcriptome of breast cancer cells using Global Run-On and sequencing (GRO-seq). The data were analyzed using a new bioinformatic approach that allowed us to identify transcripts directly from the GRO-seq data. We found that estrogen signaling directly regulates a strikingly large fraction of the transcriptome in a rapid, robust, and unexpectedly transient manner. In addition to protein coding genes, estrogen regulates the distribution and activity of all three RNA polymerases, and virtually every class of non-coding RNA that has been described to date. We also identified a large number of previously undetected estrogen-regulated intergenic transcripts, many of which are found proximal to estrogen receptor binding sites. Collectively, our results provide the most comprehensive measurement of the primary and immediate estrogen effects to date and a resource for understanding rapid signal-dependent transcription in other systems. PMID:21549415
Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn's disease.
Marigorta, Urko M; Denson, Lee A; Hyams, Jeffrey S; Mondal, Kajari; Prince, Jarod; Walters, Thomas D; Griffiths, Anne; Noe, Joshua D; Crandall, Wallace V; Rosh, Joel R; Mack, David R; Kellermayer, Richard; Heyman, Melvin B; Baker, Susan S; Stephens, Michael C; Baldassano, Robert N; Markowitz, James F; Kim, Mi-Ok; Dubinsky, Marla C; Cho, Judy; Aronow, Bruce J; Kugathasan, Subra; Gibson, Greg
2017-10-01
Gene expression profiling can be used to uncover the mechanisms by which loci identified through genome-wide association studies (GWAS) contribute to pathology. Given that most GWAS hits are in putative regulatory regions and transcript abundance is physiologically closer to the phenotype of interest, we hypothesized that summation of risk-allele-associated gene expression, namely a transcriptional risk score (TRS), should provide accurate estimates of disease risk. We integrate summary-level GWAS and expression quantitative trait locus (eQTL) data with RNA-seq data from the RISK study, an inception cohort of pediatric Crohn's disease. We show that TRSs based on genes regulated by variants linked to inflammatory bowel disease (IBD) not only outperform genetic risk scores (GRSs) in distinguishing Crohn's disease from healthy samples, but also serve to identify patients who in time will progress to complicated disease. Our dissection of eQTL effects may be used to distinguish genes whose association with disease is through promotion versus protection, thereby linking statistical association to biological mechanism. The TRS approach constitutes a potential strategy for personalized medicine that enhances inference from static genotypic risk assessment.
RNA-seq: technical variability and sampling
2011-01-01
Background RNA-seq is revolutionizing the way we study transcriptomes. mRNA can be surveyed without prior knowledge of gene transcripts. Alternative splicing of transcript isoforms and the identification of previously unknown exons are being reported. Initial reports of differences in exon usage, and splicing between samples as well as quantitative differences among samples are beginning to surface. Biological variation has been reported to be larger than technical variation. In addition, technical variation has been reported to be in line with expectations due to random sampling. However, strategies for dealing with technical variation will differ depending on the magnitude. The size of technical variance, and the role of sampling are examined in this manuscript. Results In this study three independent Solexa/Illumina experiments containing technical replicates are analyzed. When coverage is low, large disagreements between technical replicates are apparent. Exon detection between technical replicates is highly variable when the coverage is less than 5 reads per nucleotide and estimates of gene expression are more likely to disagree when coverage is low. Although large disagreements in the estimates of expression are observed at all levels of coverage. Conclusions Technical variability is too high to ignore. Technical variability results in inconsistent detection of exons at low levels of coverage. Further, the estimate of the relative abundance of a transcript can substantially disagree, even when coverage levels are high. This may be due to the low sampling fraction and if so, it will persist as an issue needing to be addressed in experimental design even as the next wave of technology produces larger numbers of reads. We provide practical recommendations for dealing with the technical variability, without dramatic cost increases. PMID:21645359
Sun, Yuhao; Pan, Sijian; Gu, Changwei; Chen, Xiao; Wang, Weiqing; Ning, Guang; Bian, Liuguan; Sun, Qingfang
2018-01-01
Cushing's disease is primarily caused by pituitary adrenocorticotropin-secreting adenoma. However, its pathogenesis has remained obscure. In the present study, whole transcriptome analysis was performed by RNA sequencing (RNA-Seq) and expression of secreted frizzled-related protein 2 (SFRP2) was decreased in corticotroph tumors compared with normal pituitary glands. Furthermore, the RNA-Seq results were validated and the expression of SFRP2 in tumor tissues was analyzed by comparing another cohort of 23 patients with Cushing's disease and 3 normal human pituitary samples using reverse transcription-quantitative polymerase chain reaction, western blot and immunohistochemistry staining. Clinically, there was an association between lower SFRP2 expression and aggressive adenoma characteristics, including larger size and invasiveness. Conversely, SFRP2 overexpression reduced the ability of AtT20 cells to proliferate and migrate, and reduced production of the adrenocorticotrophic hormone in vitro. Mechanistically, overexpressed SFRP2 reduced the level of β-catenin in the cytoplasm and nucleus, and decreased Wnt signaling activity in AtT20 cells. Therefore, SFRP2 appears to act as a tumor suppressor in Cushing's disease by regulating the activity of the Wnt signaling pathway. PMID:29620167
Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W
2018-04-12
RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
2010-01-01
Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909
LeBlanc, Megan; Kim, Gunjune; Patel, Beneeta; Stromberg, Verlyn; Westwood, James
2013-12-01
The cross-species movement of mRNA from hosts to the parasitic plant Cuscuta pentagona has been reported previously, but has not been characterized quantitatively or with attention to uptake patterns and the fate of specific mRNAs. Real-time PCR and RNA-Seq approaches were used to identify and characterize mobile transcripts from tomato and Arabidopsis hosts into C. pentagona. Tomato transcripts of Gibberellic Acid Insensitive (SlGAI) and Cathepsin D Proteinase Inhibitor (SlPI) differed significantly in the rate of uptake into the parasite, but were then distributed over the length of the parasite shoot. When parasite shoots were detached from the hosts, the SlPI transcript concentrations in the parasite showed the greatest decrease within the first 8 h. Arabidopsis transcripts also varied in mobility into the parasite, and assay of specific regions of a Salt-inducible Zinc Finger Protein (AtSZF1) transcript revealed distinct patterns of abundance in the parasite. The uptake and distribution of host mRNAs into C. pentagona appears to vary among mRNAs, and perhaps even with the region of the mRNA under investigation. We propose that mRNAs traffic into the parasite via multiple routes, or that other mechanisms for selective uptake and mobility exist between host and parasite. © 2013 The Authors. New Phytologist © 2013 New Phytologist Trust.
Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.
Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi
2013-10-07
The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations.
Kugelman, Jeffrey R; Wiley, Michael R; Nagle, Elyse R; Reyes, Daniel; Pfeffer, Brad P; Kuhn, Jens H; Sanchez-Lockhart, Mariano; Palacios, Gustavo F
2017-01-01
Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA) as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5) of all compared methods.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations
Kugelman, Jeffrey R.; Wiley, Michael R.; Nagle, Elyse R.; Reyes, Daniel; Pfeffer, Brad P.; Kuhn, Jens H.; Sanchez-Lockhart, Mariano; Palacios, Gustavo F.
2017-01-01
Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic “no amplification” method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a “targeted” amplification method, sequence-independent single-primer amplification (SISPA) as a “random” amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced “no amplification” method, and Illumina TruSeq RNA Access as a “targeted” enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4−5) of all compared methods. PMID:28182717
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Brauer, Chris J; Unmack, Peter J; Beheregaray, Luciano B
2017-12-01
Understanding whether small populations with low genetic diversity can respond to rapid environmental change via phenotypic plasticity is an outstanding research question in biology. RNA sequencing (RNA-seq) has recently provided the opportunity to examine variation in gene expression, a surrogate for phenotypic variation, in nonmodel species. We used a comparative RNA-seq approach to assess expression variation within and among adaptively divergent populations of a threatened freshwater fish, Nannoperca australis, found across a steep hydroclimatic gradient in the Murray-Darling Basin, Australia. These populations evolved under contrasting selective environments (e.g., dry/hot lowland; wet/cold upland) and represent opposite ends of the species' spectrum of genetic diversity and population size. We tested the hypothesis that environmental variation among isolated populations has driven the evolution of divergent expression at ecologically important genes using differential expression (DE) analysis and an anova-based comparative phylogenetic expression variance and evolution model framework based on 27,425 de novo assembled transcripts. Additionally, we tested whether gene expression variance within populations was correlated with levels of standing genetic diversity. We identified 290 DE candidate transcripts, 33 transcripts with evidence for high expression plasticity, and 50 candidates for divergent selection on gene expression after accounting for phylogenetic structure. Variance in gene expression appeared unrelated to levels of genetic diversity. Functional annotation of the candidate transcripts revealed that variation in water quality is an important factor influencing expression variation for N. australis. Our findings suggest that gene expression variation can contribute to the evolutionary potential of small populations. © 2017 John Wiley & Sons Ltd.
GENCODE: the reference human genome annotation for The ENCODE Project.
Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J
2012-09-01
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.
RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes
Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.
2013-01-01
Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052
Post-transcriptional Mechanisms Contribute Little to Phenotypic Variation in Snake Venoms.
Rokyta, Darin R; Margres, Mark J; Calvin, Kate
2015-09-09
Protein expression is a major link in the genotype-phenotype relationship, and processes affecting protein abundances, such as rates of transcription and translation, could contribute to phenotypic evolution if they generate heritable variation. Recent work has suggested that mRNA abundances do not accurately predict final protein abundances, which would imply that post-transcriptional regulatory processes contribute significantly to phenotypes. Post-transcriptional processes also appear to buffer changes in transcriptional patterns as species diverge, suggesting that the transcriptional changes have little or no effect on the phenotypes undergoing study. We tested for concordance between mRNA and protein expression levels in snake venoms by means of mRNA-seq and quantitative mass spectrometry for 11 snakes representing 10 species, six genera, and three families. In contrast to most previous work, we found high correlations between venom gland transcriptomes and venom proteomes for 10 of our 11 comparisons. We tested for protein-level buffering of transcriptional changes during species divergence by comparing the difference between transcript abundance and protein abundance for three pairs of species and one intraspecific pair. We found no evidence for buffering during divergence of our three species pairs but did find evidence for protein-level buffering for our single intraspecific comparison, suggesting that buffering, if present, was a transient phenomenon in venom divergence. Our results demonstrated that post-transcriptional mechanisms did not contribute significantly to phenotypic evolution in venoms and suggest a more prominent and direct role for cis-regulatory evolution in phenotypic variation, particularly for snake venoms. Copyright © 2015 Rokyta et al.
Detection and Analysis of Circular RNAs by RT-PCR.
Panda, Amaresh C; Gorospe, Myriam
2018-03-20
Gene expression in eukaryotic cells is tightly regulated at the transcriptional and posttranscriptional levels. Posttranscriptional processes, including pre-mRNA splicing, mRNA export, mRNA turnover, and mRNA translation, are controlled by RNA-binding proteins (RBPs) and noncoding (nc)RNAs. The vast family of ncRNAs comprises diverse regulatory RNAs, such as microRNAs and long noncoding (lnc)RNAs, but also the poorly explored class of circular (circ)RNAs. Although first discovered more than three decades ago by electron microscopy, only the advent of high-throughput RNA-sequencing (RNA-seq) and the development of innovative bioinformatic pipelines have begun to allow the systematic identification of circRNAs (Szabo and Salzman, 2016; Panda et al ., 2017b; Panda et al ., 2017c). However, the validation of true circRNAs identified by RNA sequencing requires other molecular biology techniques including reverse transcription (RT) followed by conventional or quantitative (q) polymerase chain reaction (PCR), and Northern blot analysis (Jeck and Sharpless, 2014). RT-qPCR analysis of circular RNAs using divergent primers has been widely used for the detection, validation, and sometimes quantification of circRNAs (Abdelmohsen et al ., 2015 and 2017; Panda et al ., 2017b). As detailed here, divergent primers designed to span the circRNA backsplice junction sequence can specifically amplify the circRNAs and not the counterpart linear RNA. In sum, RT-PCR analysis using divergent primers allows direct detection and quantification of circRNAs.
Safikhani, Zhaleh; Sadeghi, Mehdi; Pezeshk, Hamid; Eslahchi, Changiz
2013-01-01
Recent advances in the sequencing technologies have provided a handful of RNA-seq datasets for transcriptome analysis. However, reconstruction of full-length isoforms and estimation of the expression level of transcripts with a low cost are challenging tasks. We propose a novel de novo method named SSP that incorporates interval integer linear programming to resolve alternatively spliced isoforms and reconstruct the whole transcriptome from short reads. Experimental results show that SSP is fast and precise in determining different alternatively spliced isoforms along with the estimation of reconstructed transcript abundances. The SSP software package is available at http://www.bioinf.cs.ipm.ir/software/ssp. © 2013.
Bent, Zachary W.; Poorey, Kunal; LaBauve, Annette E.; ...
2016-12-21
When analyzing pathogen transcriptomes during the infection of host cells, the signal-to-background (pathogen-to-host) ratio of nucleic acids (NA) in infected samples is very small. Despite the advancements in next-generation sequencing, the minute amount of pathogen NA makes standard RNA-seq library preps inadequate for effective gene-level analysis of the pathogen in cases with low bacterial loads. In order to provide a more complete picture of the pathogen transcriptome during an infection, we developed a novel pathogen enrichment technique, which can enrich for transcripts from any cultivable bacteria or virus, using common, readily available laboratory equipment and reagents. To evenly enrich formore » pathogen transcripts, we generate biotinylated pathogen-targeted capture probes in an enzymatic process using the entire genome of the pathogen as a template. The capture probes are hybridized to a strand-specific cDNA library generated from an RNA sample. The biotinylated probes are captured on a monomeric avidin resin in a miniature spin column, and enriched pathogen-specific cDNA is eluted following a series of washes. To test this method, we performed an in vitro time-course infection using Klebsiella pneumoniae to infect murine macrophage cells. K. pneumoniae transcript enrichment efficiency was evaluated using RNA-seq. Bacterial transcripts were enriched up to ~400-fold, and allowed the recovery of transcripts from ~2000–3600 genes not observed in untreated control samples. These additional transcripts revealed interesting aspects of K. pneumoniae biology including the expression of putative virulence factors and the expression of several genes responsible for antibiotic resistance even in the absence of drugs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, Zachary W.; Poorey, Kunal; LaBauve, Annette E.
When analyzing pathogen transcriptomes during the infection of host cells, the signal-to-background (pathogen-to-host) ratio of nucleic acids (NA) in infected samples is very small. Despite the advancements in next-generation sequencing, the minute amount of pathogen NA makes standard RNA-seq library preps inadequate for effective gene-level analysis of the pathogen in cases with low bacterial loads. In order to provide a more complete picture of the pathogen transcriptome during an infection, we developed a novel pathogen enrichment technique, which can enrich for transcripts from any cultivable bacteria or virus, using common, readily available laboratory equipment and reagents. To evenly enrich formore » pathogen transcripts, we generate biotinylated pathogen-targeted capture probes in an enzymatic process using the entire genome of the pathogen as a template. The capture probes are hybridized to a strand-specific cDNA library generated from an RNA sample. The biotinylated probes are captured on a monomeric avidin resin in a miniature spin column, and enriched pathogen-specific cDNA is eluted following a series of washes. To test this method, we performed an in vitro time-course infection using Klebsiella pneumoniae to infect murine macrophage cells. K. pneumoniae transcript enrichment efficiency was evaluated using RNA-seq. Bacterial transcripts were enriched up to ~400-fold, and allowed the recovery of transcripts from ~2000–3600 genes not observed in untreated control samples. These additional transcripts revealed interesting aspects of K. pneumoniae biology including the expression of putative virulence factors and the expression of several genes responsible for antibiotic resistance even in the absence of drugs.« less
DEsingle for detecting three types of differential expression in single-cell RNA-seq data.
Miao, Zhun; Deng, Ke; Wang, Xiaowo; Zhang, Xuegong
2018-04-24
The excessive amount of zeros in single-cell RNA-seq data include "real" zeros due to the on-off nature of gene transcription in single cells and "dropout" zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy. The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor's consideration now. zhangxg@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.
Gao, Liangliang; Tu, Zheng Jin; Millett, Benjamin P; Bradeen, James M
2013-05-23
The late blight pathogen Phytophthora infestans can attack both potato foliage and tubers. Although interaction transcriptome dynamics between potato foliage and various pathogens have been reported, no transcriptome study has focused specifically upon how potato tubers respond to pathogen infection. When inoculated with P. infestans, tubers of nontransformed 'Russet Burbank' (WT) potato develop late blight disease while those of transgenic 'Russet Burbank' line SP2211 (+RB), which expresses the potato late blight resistance gene RB (Rpi-blb1), do not. We compared transcriptome responses to P. infestans inoculation in tubers of these two lines. We demonstrated the practicality of RNA-seq to study tetraploid potato and present the first RNA-seq study of potato tuber diseases. A total of 483 million paired end Illumina RNA-seq reads were generated, representing the transcription of around 30,000 potato genes. Differentially expressed genes, gene groups and ontology bins that exhibited differences between the WT and +RB lines were identified. P. infestans transcripts, including those of known effectors, were also identified. Faster and stronger activation of defense related genes, gene groups and ontology bins correlate with successful tuber resistance against P. infestans. Our results suggest that the hypersensitive response is likely a general form of resistance against the hemibiotrophic P. infestans-even in potato tubers, organs that develop below ground.
2013-01-01
Background The late blight pathogen Phytophthora infestans can attack both potato foliage and tubers. Although interaction transcriptome dynamics between potato foliage and various pathogens have been reported, no transcriptome study has focused specifically upon how potato tubers respond to pathogen infection. When inoculated with P. infestans, tubers of nontransformed ‘Russet Burbank’ (WT) potato develop late blight disease while those of transgenic ‘Russet Burbank’ line SP2211 (+RB), which expresses the potato late blight resistance gene RB (Rpi-blb1), do not. We compared transcriptome responses to P. infestans inoculation in tubers of these two lines. Results We demonstrated the practicality of RNA-seq to study tetraploid potato and present the first RNA-seq study of potato tuber diseases. A total of 483 million paired end Illumina RNA-seq reads were generated, representing the transcription of around 30,000 potato genes. Differentially expressed genes, gene groups and ontology bins that exhibited differences between the WT and +RB lines were identified. P. infestans transcripts, including those of known effectors, were also identified. Conclusion Faster and stronger activation of defense related genes, gene groups and ontology bins correlate with successful tuber resistance against P. infestans. Our results suggest that the hypersensitive response is likely a general form of resistance against the hemibiotrophic P. infestans—even in potato tubers, organs that develop below ground. PMID:23702331
Qi, Lei; Yue, Lei; Feng, Deqin; Qi, Fengxia
2017-01-01
Abstract Unlike stable RNAs that require processing for maturation, prokaryotic cellular mRNAs generally follow an ‘all-or-none’ pattern. Herein, we used a 5΄ monophosphate transcript sequencing (5΄P-seq) that specifically captured the 5΄-end of processed transcripts and mapped the genome-wide RNA processing sites (PSSs) in a methanogenic archaeon. Following statistical analysis and stringent filtration, we identified 1429 PSSs, among which 23.5% and 5.4% were located in 5΄ untranslated region (uPSS) and intergenic region (iPSS), respectively. A predominant uridine downstream PSSs served as a processing signature. Remarkably, 5΄P-seq detected overrepresented uPSS and iPSS in the polycistronic operons encoding ribosomal proteins, and the majority upstream and proximal ribosome binding sites, suggesting a regulatory role of processing on translation initiation. The processed transcripts showed increased stability and translation efficiency. Particularly, processing within the tricistronic transcript of rplA-rplJ-rplL enhanced the translation of rplL, which can provide a driving force for the 1:4 stoichiometry of L10 to L12 in the ribosome. Growth-associated mRNA processing intensities were also correlated with the cellular ribosomal protein levels, thereby suggesting that mRNA processing is involved in tuning growth-dependent ribosome synthesis. In conclusion, our findings suggest that mRNA processing-mediated post-transcriptional regulation is a potential mechanism of ribosomal protein synthesis and stoichiometry. PMID:28520982
2014-01-01
Background Locating the protein-coding genes in novel genomes is essential to understanding and exploiting the genomic information but it is still difficult to accurately predict all the genes. The recent availability of detailed information about transcript structure from high-throughput sequencing of messenger RNA (RNA-Seq) delineates many expressed genes and promises increased accuracy in gene prediction. Computational gene predictors have been intensively developed for and tested in well-studied animal genomes. Hundreds of fungal genomes are now or will soon be sequenced. The differences of fungal genomes from animal genomes and the phylogenetic sparsity of well-studied fungi call for gene-prediction tools tailored to them. Results SnowyOwl is a new gene prediction pipeline that uses RNA-Seq data to train and provide hints for the generation of Hidden Markov Model (HMM)-based gene predictions and to evaluate the resulting models. The pipeline has been developed and streamlined by comparing its predictions to manually curated gene models in three fungal genomes and validated against the high-quality gene annotation of Neurospora crassa; SnowyOwl predicted N. crassa genes with 83% sensitivity and 65% specificity. SnowyOwl gains sensitivity by repeatedly running the HMM gene predictor Augustus with varied input parameters and selectivity by choosing the models with best homology to known proteins and best agreement with the RNA-Seq data. Conclusions SnowyOwl efficiently uses RNA-Seq data to produce accurate gene models in both well-studied and novel fungal genomes. The source code for the SnowyOwl pipeline (in Python) and a web interface (in PHP) is freely available from http://sourceforge.net/projects/snowyowl/. PMID:24980894
Mapping RNA-seq Reads with STAR
Dobin, Alexander; Gingeras, Thomas R.
2015-01-01
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920
Mapping RNA-seq Reads with STAR.
Dobin, Alexander; Gingeras, Thomas R
2015-09-03
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.
Lima, Leandro; Sinaimeri, Blerina; Sacomoto, Gustavo; Lopez-Maestre, Helene; Marchet, Camille; Miele, Vincent; Sagot, Marie-France; Lacroix, Vincent
2017-01-01
The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them. The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.
2012-01-01
Background Vitis vinifera berry development is characterised by an initial phase where the fruit is small, hard and acidic, followed by a lag phase known as veraison. In the final phase, berries become larger, softer and sweeter and accumulate an array of organoleptic compounds. Since the physiological and biochemical makeup of grape berries at harvest has a profound impact on the characteristics of wine, there is great interest in characterising the molecular and biophysical changes that occur from flowering through veraison and ripening, including the coordination and temporal regulation of metabolic gene pathways. Advances in deep-sequencing technologies, combined with the availability of increasingly accurate V. vinifera genomic and transcriptomic data, have enabled us to carry out RNA-transcript expression analysis on a global scale at key points during berry development. Results A total of 162 million 100-base pair reads were generated from pooled Vitis vinifera (cv. Shiraz) berries sampled at 3-weeks post-anthesis, 10- and 11-weeks post-anthesis (corresponding to early and late veraison) and at 17-weeks post-anthesis (harvest). Mapping reads from each developmental stage (36-45 million) onto the NCBI RefSeq transcriptome of 23,720 V. vinifera mRNAs revealed that at least 75% of these transcripts were detected in each sample. RNA-Seq analysis uncovered 4,185 transcripts that were significantly upregulated at a single developmental stage, including 161 transcription factors. Clustering transcripts according to distinct patterns of transcription revealed coordination in metabolic pathways such as organic acid, stilbene and terpenoid metabolism. From the phenylpropanoid/stilbene biosynthetic pathway at least 46 transcripts were upregulated in ripe berries when compared to veraison and immature berries, and 12 terpene synthases were predominantly detected only in a single sample. Quantitative real-time PCR was used to validate the expression pattern of 12 differentially expressed genes from primary and secondary metabolic pathways. Conclusions In this study we report the global transcriptional profile of Shiraz grapes at key stages of development. We have undertaken a comprehensive analysis of gene families contributing to commercially important berry characteristics and present examples of co-regulation and differential gene expression. The data reported here will provide an invaluable resource for the on-going molecular investigation of wine grapes. PMID:23227855
Analysis of miRNA expression profiles in melatonin-exposed GC-1 spg cell line.
Zhu, Xiaoling; Chen, Shuxiong; Jiang, Yanwen; Xu, Ying; Zhao, Yun; Chen, Lu; Li, Chunjin; Zhou, Xu
2018-02-05
Melatonin is an endocrine neurohormone secreted by pinealocytes in the pineal gland. It exerts diverse physiological effects, such as circadian rhythm regulator and antioxidant. However, the functional importance of melatonin in spermatogenesis regulation remains unclear. The objectives of this study are to: (1) detect melatonin affection on miRNA expression profiles in GC-1 spg cells by miRNA deep sequencing (DeepSeq) and (2) define melatonin affected miRNA-mRNA interactions and associated biological processes using bioinformatics analysis. GC-1 spg cells were cultured with melatonin (10 -7 M) for 24h. DeepSeq data were validated using quantitative real-time reverse transcription polymerase chain reaction analysis (qRT-PCR). A total of 176 miRNA expressions were found to be significantly different between two groups (fold change of >2 or <0.5 and FDR<0.05). Among these expressions, 171 were up-regulated, and 5 were down-regulated. Ontology analysis of biological processes of these targets indicated a variety of biological functions. Pathway analysis indicated that the predicted targets were involved in cancers, apoptosis and signaling pathways, such as VEGF, TNF, Ras and Notch. Results implicated that melatonin could regulate the expression of miRNA to perform its physiological effects in GC-1 spg cells. These results should be useful to investigate the biological function of miRNAs regulated by melatonin in spermatogenesis and testicular germ cell tumor. Copyright © 2017 Elsevier B.V. All rights reserved.
Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda
2017-06-26
The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB ( http://paramecium.i2bc.paris-saclay.fr ). TrUC software is freely distributed under a GNU GPL v3 licence ( https://github.com/oarnaiz/TrUC ).
RNA-SeQC: RNA-seq metrics for quality control and process optimization.
DeLuca, David S; Levin, Joshua Z; Sivachenko, Andrey; Fennell, Timothy; Nazaire, Marc-Danie; Williams, Chris; Reich, Michael; Winckler, Wendy; Getz, Gad
2012-06-01
RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3'/5' bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis. See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.
Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K
2016-01-01
RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.
Global regulation of alternative RNA splicing by the SR-rich protein RBM39.
Mai, Sanyue; Qu, Xiuhua; Li, Ping; Ma, Qingjun; Cao, Cheng; Liu, Xuan
2016-08-01
RBM39 is a serine/arginine-rich RNA-binding protein that is highly homologous to the splicing factor U2AF65. However, the role of RBM39 in alternative splicing is poorly understood. In this study, RBM39-mediated global alternative splicing was investigated using RNA-Seq and genome-wide RBM39-RNA interactions were mapped via cross-linking and immunoprecipitation coupled with deep sequencing (CLIP-Seq) in wild-type and RBM39-knockdown MCF-7 cells. RBM39 was involved in the up- or down-regulation of the transcript levels of various genes. Hundreds of alternative splicing events regulated by endogenous RBM39 were identified. The majority of these events were cassette exons. Genes containing RBM39-regulated alternative exons were found to be linked to G2/M transition, cellular response to DNA damage, adherens junctions and endocytosis. CLIP-Seq analysis showed that the binding site of RBM39 was mainly in proximity to 5' and 3' splicing sites. Considerable RBM39 binding to mRNAs encoding proteins involved in translation was observed. Of particular importance, ~20% of the alternative splicing events that were significantly regulated by RBM39 were similarly regulated by U2AF65. RBM39 is extensively involved in alternative splicing of RNA and helps regulate transcript levels. RBM39 may modulate alternative splicing similarly to U2AF65 by either directly binding to RNA or recruiting other splicing factors, such as U2AF65. The current study offers a genome-wide view of RBM39's regulatory function in alternative splicing. RBM39 may play important roles in multiple cellular processes by regulating both alternative splicing of RNA molecules and transcript levels. Copyright © 2016 Elsevier B.V. All rights reserved.
Sheynkman, Gloria M.; Shortreed, Michael R.; Frey, Brian L.; Scalf, Mark; Smith, Lloyd M.
2013-01-01
Each individual carries thousands of non-synonymous single nucleotide variants (nsSNVs) in their genome, each corresponding to a single amino acid polymorphism (SAP) in the encoded proteins. It is important to be able to directly detect and quantify these variations at the protein level in order to study post-transcriptional regulation, differential allelic expression, and other important biological processes. However, such variant peptides are not generally detected in standard proteomic analyses, due to their absence from the generic databases that are employed for mass spectrometry searching. Here, we extend previous work that demonstrated the use of customized SAP databases constructed from sample-matched RNA-Seq data. We collected deep coverage RNA-Seq data from the Jurkat cell line, compiled the set of nsSNVs that are expressed, used this information to construct a customized SAP database, and searched it against deep coverage shotgun MS data obtained from the same sample. This approach enabled detection of 421 SAP peptides mapping to 395 nsSNVs. We compared these peptides to peptides identified from a large generic search database containing all known nsSNVs (dbSNP) and found that more than 70% of the SAP peptides from this dbSNP-derived search were not supported by the RNA-Seq data, and thus are likely false positives. Next, we increased the SAP coverage from the RNA-Seq derived database by utilizing multiple protease digestions, thereby increasing variant detection to 695 SAP peptides mapping to 504 nsSNV sites. These detected SAP peptides corresponded to moderate to high abundance transcripts (30+ transcripts per million, TPM). The SAP peptides included 192 allelic pairs; the relative expression levels of the two alleles were evaluated for 51 of those pairs, and found to be comparable in all cases. PMID:24175627
2010-01-01
Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097
Chan, Jasper Fuk-Woo; Choi, Garnet Kwan-Yue; Tsang, Alan Ka-Lun; Tee, Kah-Meng; Lam, Ho-Yin; Yip, Cyril Chik-Yan; To, Kelvin Kai-Wang; Cheng, Vincent Chi-Chung; Yeung, Man-Lung; Lau, Susanna Kar-Pui; Woo, Patrick Chiu-Yat; Chan, Kwok-Hung; Tang, Bone Siu-Fai
2015-01-01
Based on findings in small RNA-sequencing (Seq) data analysis, we developed highly sensitive and specific real-time reverse transcription (RT)-PCR assays with locked nucleic acid probes targeting the abundantly expressed leader sequences of Middle East respiratory syndrome coronavirus (MERS-CoV) and other human coronaviruses. Analytical and clinical evaluations showed their noninferiority to a commercial multiplex PCR test for the detection of these coronaviruses. PMID:26019210
Li, Shengjie; Shen, Li; Sun, Lianjie; Xu, Jiao; Jin, Ping; Chen, Liming; Ma, Fei
2017-05-01
Drosophila have served as a model for research on innate immunity for decades. However, knowledge of the post-transcriptional regulation of immune gene expression by microRNAs (miRNAs) remains rudimentary. In the present study, using small RNA-seq and bioinformatics analysis, we identified 67 differentially expressed miRNAs in Drosophila infected with Escherichia coli compared to injured flies at three time-points. Furthermore, we found that 21 of these miRNAs were potentially involved in the regulation of Imd pathway-related genes. Strikingly, based on UAS-miRNAs line screening and Dual-luciferase assay, we identified that miR-9a and miR-981 could both negatively regulate Drosophila antibacterial defenses and decrease the level of the antibacterial peptide, Diptericin. Taken together, these data support the involvement of miRNAs in the regulation of the Drosophila Imd pathway. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ma, Chuang; Wang, Xiangfeng
2012-09-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.
Ma, Chuang; Wang, Xiangfeng
2012-01-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey’s biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses. PMID:22797655
Golumbeanu, Monica; Cristinelli, Sara; Rato, Sylvie; Munoz, Miguel; Cavassini, Matthias; Beerenwinkel, Niko; Ciuffi, Angela
2018-04-24
Despite effective treatment, HIV can persist in latent reservoirs, which represent a major obstacle toward HIV eradication. Targeting and reactivating latent cells is challenging due to the heterogeneous nature of HIV-infected cells. Here, we used a primary model of HIV latency and single-cell RNA sequencing to characterize transcriptional heterogeneity during HIV latency and reactivation. Our analysis identified transcriptional programs leading to successful reactivation of HIV expression. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Lin, Yang; Lewallen, Eric A; Camilleri, Emily T; Bonin, Carolina A; Jones, Dakota L; Dudakovic, Amel; Galeano-Garces, Catalina; Wang, Wei; Karperien, Marcel J; Larson, Annalise N; Dahm, Diane L; Stuart, Michael J; Levy, Bruce A; Smith, Jay; Ryssman, Daniel B; Westendorf, Jennifer J; Im, Hee-Jeong; van Wijnen, Andre J; Riester, Scott M; Krych, Aaron J
2016-11-01
Preservation of osteochondral allografts used for transplantation is critical to ensure favorable outcomes for patients after surgical treatment of cartilage defects. To study the biological effects of protocols currently used for cartilage storage, we investigated differences in gene expression between stored allograft cartilage and fresh cartilage from living donors using high throughput molecular screening strategies. We applied next generation RNA sequencing (RNA-seq) and real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR) to assess genome-wide differences in mRNA expression between stored allograft cartilage and fresh cartilage tissue from living donors. Gene ontology analysis was used to characterize biological pathways associated with differentially expressed genes. Our studies establish reduced levels of mRNAs encoding cartilage related extracellular matrix (ECM) proteins (i.e., COL1A1, COL2A1, COL10A1, ACAN, DCN, HAPLN1, TNC, and COMP) in stored cartilage. These changes occur concomitantly with increased expression of "early response genes" that encode transcription factors mediating stress/cytoprotective responses (i.e., EGR1, EGR2, EGR3, MYC, FOS, FOSB, FOSL1, FOSL2, JUN, JUNB, and JUND). The elevated expression of "early response genes" and reduced levels of ECM-related mRNAs in stored cartilage allografts suggests that tissue viability may be maintained by a cytoprotective program that reduces cell metabolic activity. These findings have potential implications for future studies focused on quality assessment and clinical optimization of osteochondral allografts used for cartilage transplantation. © 2016 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 34:1950-1959, 2016. © 2016 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
Cheng, Chia-Yang; Chu, Chia-Han; Hsu, Hung-Wei; Hsu, Fang-Rong; Tang, Chung Yi; Wang, Wen-Ching; Kung, Hsing-Jien; Chang, Pei-Ching
2014-01-01
Post-translational modification (PTM) of transcriptional factors and chromatin remodelling proteins is recognized as a major mechanism by which transcriptional regulation occurs. Chromatin immunoprecipitation (ChIP) in combination with high-throughput sequencing (ChIP-seq) is being applied as a gold standard when studying the genome-wide binding sites of transcription factor (TFs). This has greatly improved our understanding of protein-DNA interactions on a genomic-wide scale. However, current ChIP-seq peak calling tools are not sufficiently sensitive and are unable to simultaneously identify post-translational modified TFs based on ChIP-seq analysis; this is largely due to the wide-spread presence of multiple modified TFs. Using SUMO-1 modification as an example; we describe here an improved approach that allows the simultaneous identification of the particular genomic binding regions of all TFs with SUMO-1 modification. Traditional peak calling methods are inadequate when identifying multiple TF binding sites that involve long genomic regions and therefore we designed a ChIP-seq processing pipeline for the detection of peaks via a combinatorial fusion method. Then, we annotate the peaks with known transcription factor binding sites (TFBS) using the Transfac Matrix Database (v7.0), which predicts potential SUMOylated TFs. Next, the peak calling result was further analyzed based on the promoter proximity, TFBS annotation, a literature review, and was validated by ChIP-real-time quantitative PCR (qPCR) and ChIP-reChIP real-time qPCR. The results show clearly that SUMOylated TFs are able to be pinpointed using our pipeline. A methodology is presented that analyzes SUMO-1 ChIP-seq patterns and predicts related TFs. Our analysis uses three peak calling tools. The fusion of these different tools increases the precision of the peak calling results. TFBS annotation method is able to predict potential SUMOylated TFs. Here, we offer a new approach that enhances ChIP-seq data analysis and allows the identification of multiple SUMOylated TF binding sites simultaneously, which can then be utilized for other functional PTM binding site prediction in future.
Mascarenhas, Roshan; Pietrzak, Maciej; Smith, Ryan M; Webb, Amy; Wang, Danxin; Papp, Audrey C; Pinsonneault, Julia K; Seweryn, Michal; Rempala, Grzegorz; Sadee, Wolfgang
2015-01-01
mRNA translation into proteins is highly regulated, but the role of mRNA isoforms, noncoding RNAs (ncRNAs), and genetic variants remains poorly understood. mRNA levels on polysomes have been shown to correlate well with expressed protein levels, pointing to polysomal loading as a critical factor. To study regulation and genetic factors of protein translation we measured levels and allelic ratios of mRNAs and ncRNAs (including microRNAs) in lymphoblast cell lines (LCL) and in polysomal fractions. We first used targeted assays to measure polysomal loading of mRNA alleles, confirming reported genetic effects on translation of OPRM1 and NAT1, and detecting no effect of rs1045642 (3435C>T) in ABCB1 (MDR1) on polysomal loading while supporting previous results showing increased mRNA turnover of the 3435T allele. Use of high-throughput sequencing of complete transcript profiles (RNA-Seq) in three LCLs revealed significant differences in polysomal loading of individual RNA classes and isoforms. Correlated polysomal distribution between protein-coding and non-coding RNAs suggests interactions between them. Allele-selective polysome recruitment revealed strong genetic influence for multiple RNAs, attributable either to differential expression of RNA isoforms or to differential loading onto polysomes, the latter defining a direct genetic effect on translation. Genes identified by different allelic RNA ratios between cytosol and polysomes were enriched with published expression quantitative trait loci (eQTLs) affecting RNA functions, and associations with clinical phenotypes. Polysomal RNA-Seq combined with allelic ratio analysis provides a powerful approach to study polysomal RNA recruitment and regulatory variants affecting protein translation.
Singh, Anil Kumar; Sharma, Vishal; Pal, Awadhesh Kumar; Acharya, Vishal; Ahuja, Paramvir Singh
2013-08-01
NAC [no apical meristem (NAM), Arabidopsis thaliana transcription activation factor [ATAF1/2] and cup-shaped cotyledon (CUC2)] proteins belong to one of the largest plant-specific transcription factor (TF) families and play important roles in plant development processes, response to biotic and abiotic cues and hormone signalling. Our genome-wide analysis identified 110 StNAC genes in potato encoding for 136 proteins, including 14 membrane-bound TFs. The physical map positions of StNAC genes on 12 potato chromosomes were non-random, and 40 genes were found to be distributed in 16 clusters. The StNAC proteins were phylogenetically clustered into 12 subgroups. Phylogenetic analysis of StNACs along with their Arabidopsis and rice counterparts divided these proteins into 18 subgroups. Our comparative analysis has also identified 36 putative TNAC proteins, which appear to be restricted to Solanaceae family. In silico expression analysis, using Illumina RNA-seq transcriptome data, revealed tissue-specific, biotic, abiotic stress and hormone-responsive expression profile of StNAC genes. Several StNAC genes, including StNAC072 and StNAC101that are orthologs of known stress-responsive Arabidopsis RESPONSIVE TO DEHYDRATION 26 (RD26) were identified as highly abiotic stress responsive. Quantitative real-time polymerase chain reaction analysis largely corroborated the expression profile of StNAC genes as revealed by the RNA-seq data. Taken together, this analysis indicates towards putative functions of several StNAC TFs, which will provide blue-print for their functional characterization and utilization in potato improvement.
Sõber, Siim; Rull, Kristiina; Reiman, Mario; Ilisson, Piret; Mattila, Pirkko; Laan, Maris
2016-01-01
Recurrent pregnancy loss (RPL) concerns ~3% of couples aiming at childbirth. In the current study, transcriptomes and miRNomes of 1st trimester placental chorionic villi were analysed for 2 RPL cases (≥6 miscarriages) and normal, but electively terminated pregnancies (ETP; n = 8). Sequencing was performed on Illumina HiSeq 2000 platform. Differential expression analyses detected 51 (27%) transcripts with increased and 138 (73%) with decreased expression in RPL compared to ETP (DESeq: FDR P < 0.1 and DESeq2: <0.05). RPL samples had substantially decreased transcript levels of histones, regulatory RNAs and genes involved in telomere, spliceosome, ribosomal, mitochondrial and intra-cellular signalling functions. Downregulated expression of HIST1H1B and HIST1H4A (Wilcoxon test, fc≤0.372, P≤9.37 × 10−4) was validated in an extended sample by quantitative PCR (RPL, n = 14; ETP, n = 24). Several upregulated genes are linked to placental function and pregnancy complications: ATF4, C3, PHLDA2, GPX4, ICAM1, SLC16A2. Analysis of the miRNA-Seq dataset identified no large disturbances in RPL samples. Notably, nearly 2/3 of differentially expressed genes have binding sites for E2F transcription factors, coordinating mammalian endocycle and placental development. For a conceptus destined to miscarriage, the E2F TF-family represents a potential key coordinator in reprogramming the placental genome towards gradually stopping the maintenance of basic nuclear and cellular functions. PMID:27929073
Larsen, N; Brøsted Werner, B; Jespersen, L
2016-08-01
Milk acidification and metabolic activity of the starter cultures are affected by oxygen; however, molecular factors related to the redox changes are poorly defined. The objective of the study was to investigate transcriptional responses in Lactococcus lactis subsp. cremoris CHCCO2 grown in milk to the shifts of oxygen and redox potential (Eh7 ). Transcriptomic studies were performed with the use of Illumina HiSeq 2000 mRNA sequencing and validated by the real-time quantitative PCR. In total 105 differentially expressed genes were assigned functional gene names. Most of the differentially expressed genes were detected during aerobic reduction phase. Upregulated genes were implicated in lactose utilization, glycogen biosynthesis, amino sugar metabolism, oxidation-reduction, pyrimidine biosynthesis and DNA integration processes. Genes of purine nucleotide biosynthesis and genes encoding amino acid, multidrug resistance and ion ABC transporters were mostly downregulated, while oligopeptide transporter genes were reduced during oxygen depletion and induced at minimum Eh7 . Understanding of gene responses in starter cultures to the changes of oxidation-reduction state is important for the better control and reproducibility of dairy fermentations. We applied mRNA sequencing by Illumina HiSeq 2000 to investigate gene expression profile in a dairy strain of Lactococcus lactis subsp. cremoris during milk acidification. Novelty of this study lies in linking transcriptional responses to oxygen depletion and the changes of redox potential with the fermentation kinetics and clarification of molecular factors specifically expressed in milk which might be essential for bacterial performance and the final quality of cheeses. © 2016 The Society for Applied Microbiology.
Irla, Marta; Neshat, Armin; Brautaset, Trygve; Rückert, Christian; Kalinowski, Jörn; Wendisch, Volker F
2015-02-14
Bacillus methanolicus MGA3 is a thermophilic, facultative ribulose monophosphate (RuMP) cycle methylotroph. Together with its ability to produce high yields of amino acids, the relevance of this microorganism as a promising candidate for biotechnological applications is evident. The B. methanolicus MGA3 genome consists of a 3,337,035 nucleotides (nt) circular chromosome, the 19,174 nt plasmid pBM19 and the 68,999 nt plasmid pBM69. 3,218 protein-coding regions were annotated on the chromosome, 22 on pBM19 and 82 on pBM69. In the present study, the RNA-seq approach was used to comprehensively investigate the transcriptome of B. methanolicus MGA3 in order to improve the genome annotation, identify novel transcripts, analyze conserved sequence motifs involved in gene expression and reveal operon structures. For this aim, two different cDNA library preparation methods were applied: one which allows characterization of the whole transcriptome and another which includes enrichment of primary transcript 5'-ends. Analysis of the primary transcriptome data enabled the detection of 2,167 putative transcription start sites (TSSs) which were categorized into 1,642 TSSs located in the upstream region (5'-UTR) of known protein-coding genes and 525 TSSs of novel antisense, intragenic, or intergenic transcripts. Firstly, 14 wrongly annotated translation start sites (TLSs) were corrected based on primary transcriptome data. Further investigation of the identified 5'-UTRs resulted in the detailed characterization of their length distribution and the detection of 75 hitherto unknown cis-regulatory RNA elements. Moreover, the exact TSSs positions were utilized to define conserved sequence motifs for translation start sites, ribosome binding sites and promoters in B. methanolicus MGA3. Based on the whole transcriptome data set, novel transcripts, operon structures and mRNA abundances were determined. The analysis of the operon structures revealed that almost half of the genes are transcribed monocistronically (940), whereas 1,164 genes are organized in 381 operons. Several of the genes related to methylotrophy had highly abundant transcripts. The extensive insights into the transcriptional landscape of B. methanolicus MGA3, gained in this study, represent a valuable foundation for further comparative quantitative transcriptome analyses and possibly also for the development of molecular biology tools which at present are very limited for this organism.
Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213
2012-01-01
Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis. PMID:23046475
Sarkar, Soumyadev; Chakravorty, Somnath; Mukherjee, Avishek; Bhattacharya, Debanjana; Bhattacharya, Semantee; Gachhui, Ratan
2018-03-01
Nitrogen is a key nutrient for all cell forms. Most organisms respond to nitrogen scarcity by slowing down their growth rate. On the contrary, our previous studies have shown that Papiliotrema laurentii strain RY1 has a robust growth under nitrogen starvation. To understand the global regulation that leads to such an extraordinary response, we undertook a de novo approach for transcriptome analysis of the yeast. Close to 33 million sequence reads of high quality for nitrogen limited and enriched condition were generated using Illumina NextSeq500. Trinity analysis and clustered transcripts annotation of the reads produced 17,611 unigenes, out of which 14,157 could be annotated. Gene Ontology term analysis generated 44.92% cellular component terms, 39.81% molecular function terms and 15.24% biological process terms. The most over represented pathways in general were translation, carbohydrate metabolism, amino acid metabolism, general metabolism, folding, sorting, degradation followed by transport and catabolism, nucleotide metabolism, replication and repair, transcription and lipid metabolism. A total of 4256 Single Sequence Repeats were identified. Differential gene expression analysis detected 996 P-significant transcripts to reveal transmembrane transport, lipid homeostasis, fatty acid catabolism and translation as the enriched terms which could be essential for Papiliotrema laurentii strain RY1 to adapt during nitrogen deprivation. Transcriptome data was validated by quantitative real-time PCR analysis of twelve transcripts. To the best of our knowledge, this is the first report of Papiliotrema laurentii strain RY1 transcriptome which would play a pivotal role in understanding the biochemistry of the yeast under acute nitrogen stress and this study would be encouraging to initiate extensive investigations into this Papiliotrema system. Copyright © 2017 Elsevier B.V. All rights reserved.
Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes.
Ackermann, Amanda M; Wang, Zhiping; Schug, Jonathan; Naji, Ali; Kaestner, Klaus H
2016-03-01
Although glucagon-secreting α-cells and insulin-secreting β-cells have opposing functions in regulating plasma glucose levels, the two cell types share a common developmental origin and exhibit overlapping transcriptomes and epigenomes. Notably, destruction of β-cells can stimulate repopulation via transdifferentiation of α-cells, at least in mice, suggesting plasticity between these cell fates. Furthermore, dysfunction of both α- and β-cells contributes to the pathophysiology of type 1 and type 2 diabetes, and β-cell de-differentiation has been proposed to contribute to type 2 diabetes. Our objective was to delineate the molecular properties that maintain islet cell type specification yet allow for cellular plasticity. We hypothesized that correlating cell type-specific transcriptomes with an atlas of open chromatin will identify novel genes and transcriptional regulatory elements such as enhancers involved in α- and β-cell specification and plasticity. We sorted human α- and β-cells and performed the "Assay for Transposase-Accessible Chromatin with high throughput sequencing" (ATAC-seq) and mRNA-seq, followed by integrative analysis to identify cell type-selective gene regulatory regions. We identified numerous transcripts with either α-cell- or β-cell-selective expression and discovered the cell type-selective open chromatin regions that correlate with these gene activation patterns. We confirmed cell type-selective expression on the protein level for two of the top hits from our screen. The "group specific protein" (GC; or vitamin D binding protein) was restricted to α-cells, while CHODL (chondrolectin) immunoreactivity was only present in β-cells. Furthermore, α-cell- and β-cell-selective ATAC-seq peaks were identified to overlap with known binding sites for islet transcription factors, as well as with single nucleotide polymorphisms (SNPs) previously identified as risk loci for type 2 diabetes. We have determined the genetic landscape of human α- and β-cells based on chromatin accessibility and transcript levels, which allowed for detection of novel α- and β-cell signature genes not previously known to be expressed in islets. Using fine-mapping of open chromatin, we have identified thousands of potential cis-regulatory elements that operate in an endocrine cell type-specific fashion.
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659
Gao, Yuan; He, Xiaoli; Wu, Bin; Long, Qiliang; Shao, Tianwei; Wang, Zi; Wei, Jianhe; Li, Yong; Ding, Wanlong
2016-01-01
Panax ginseng C. A. Meyer is a highly valued medicinal plant. Cylindrocarpon destructans is a destructive pathogen that causes root rot and significantly reduces the quality and yield of P. ginseng. However, an efficient method to control root rot remains unavailable because of insufficient understanding of the molecular mechanism underlying C. destructans-P. ginseng interaction. In this study, C. destructans-induced transcriptomes at different time points were investigated using RNA sequencing (RNA-Seq). De novo assembly produced 73,335 unigenes for the P. ginseng transcriptome after C. destructans infection, in which 3,839 unigenes were up-regulated. Notably, the abundance of the up-regulated unigenes sharply increased at 0.5 d postinoculation to provide effector-triggered immunity. In total, 24 of 26 randomly selected unigenes can be validated using quantitative reverse transcription (qRT)-PCR. Gene ontology enrichment analysis of these unigenes showed that "defense response to fungus", "defense response" and "response to stress" were enriched. In addition, differentially expressed transcription factors involved in the hormone signaling pathways after C. destructans infection were identified. Finally, differentially expressed unigenes involved in reactive oxygen species and ginsenoside biosynthetic pathway during C. destructans infection were indentified. To our knowledge, this study is the first to report on the dynamic transcriptome triggered by C. destructans. These results improve our understanding of disease resistance in P. ginseng and provide a useful resource for quick detection of induced markers in P. ginseng before the comprehensive outbreak of this disease caused by C. destructans.
Kusko, Rebecca L; Brothers, John F; Tedrow, John; Pandit, Kusum; Huleihel, Luai; Perdomo, Catalina; Liu, Gang; Juan-Guardela, Brenda; Kass, Daniel; Zhang, Sherry; Lenburg, Marc; Martinez, Fernando; Quackenbush, John; Sciurba, Frank; Limper, Andrew; Geraci, Mark; Yang, Ivana; Schwartz, David A; Beane, Jennifer; Spira, Avrum; Kaminski, Naftali
2016-10-15
Despite shared environmental exposures, idiopathic pulmonary fibrosis (IPF) and chronic obstructive pulmonary disease are usually studied in isolation, and the presence of shared molecular mechanisms is unknown. We applied an integrative genomic approach to identify convergent transcriptomic pathways in emphysema and IPF. We defined the transcriptional repertoire of chronic obstructive pulmonary disease, IPF, or normal histology lungs using RNA-seq (n = 87). Genes increased in both emphysema and IPF relative to control were enriched for the p53/hypoxia pathway, a finding confirmed in an independent cohort using both gene expression arrays and the nCounter Analysis System (n = 193). Immunohistochemistry confirmed overexpression of HIF1A, MDM2, and NFKBIB members of this pathway in tissues from patients with emphysema or IPF. Using reads aligned across splice junctions, we determined that alternative splicing of p53/hypoxia pathway-associated molecules NUMB and PDGFA occurred more frequently in IPF or emphysema compared with control and validated these findings by quantitative polymerase chain reaction and the nCounter Analysis System on an independent sample set (n = 193). Finally, by integrating parallel microRNA and mRNA-Seq data on the same samples, we identified MIR96 as a key novel regulatory hub in the p53/hypoxia gene-expression network and confirmed that modulation of MIR96 in vitro recapitulates the disease-associated gene-expression network. Our results suggest convergent transcriptional regulatory hubs in diseases as varied phenotypically as chronic obstructive pulmonary disease and IPF and suggest that these hubs may represent shared key responses of the lung to environmental stresses.
Toedebusch, Ryan G; Roberts, Michael D; Wells, Kevin D; Company, Joseph M; Kanosky, Kayla M; Padilla, Jaume; Jenkins, Nathan T; Perfield, James W; Ibdah, Jamal A; Booth, Frank W; Rector, R Scott
2014-05-15
To better understand the impact of childhood obesity on intra-abdominal adipose tissue phenotype, a complete transcriptomic analysis using deep RNA-sequencing (RNA-seq) was performed on omental adipose tissue (OMAT) obtained from lean and Western diet-induced obese juvenile Ossabaw swine. Obese animals had 88% greater body mass, 49% greater body fat content, and a 60% increase in OMAT adipocyte area (all P < 0.05) compared with lean pigs. RNA-seq revealed a 37% increase in the total transcript number in the OMAT of obese pigs. Ingenuity Pathway Analysis showed transcripts in obese OMAT were primarily enriched in the following categories: 1) development, 2) cellular function and maintenance, and 3) connective tissue development and function, while transcripts associated with RNA posttranslational modification, lipid metabolism, and small molecule biochemistry were reduced. DAVID and Gene Ontology analyses showed that many of the classically recognized gene pathways associated with adipose tissue dysfunction in obese adults including hypoxia, inflammation, angiogenesis were not altered in OMAT in our model. The current study indicates that obesity in juvenile Ossabaw swine is characterized by increases in overall OMAT transcript number and provides novel data describing early transcriptomic alterations that occur in response to excess caloric intake in visceral adipose tissue in a pig model of childhood obesity.
Toedebusch, Ryan G.; Roberts, Michael D.; Wells, Kevin D.; Company, Joseph M.; Kanosky, Kayla M.; Padilla, Jaume; Jenkins, Nathan T.; Perfield, James W.; Ibdah, Jamal A.; Booth, Frank W.
2014-01-01
To better understand the impact of childhood obesity on intra-abdominal adipose tissue phenotype, a complete transcriptomic analysis using deep RNA-sequencing (RNA-seq) was performed on omental adipose tissue (OMAT) obtained from lean and Western diet-induced obese juvenile Ossabaw swine. Obese animals had 88% greater body mass, 49% greater body fat content, and a 60% increase in OMAT adipocyte area (all P < 0.05) compared with lean pigs. RNA-seq revealed a 37% increase in the total transcript number in the OMAT of obese pigs. Ingenuity Pathway Analysis showed transcripts in obese OMAT were primarily enriched in the following categories: 1) development, 2) cellular function and maintenance, and 3) connective tissue development and function, while transcripts associated with RNA posttranslational modification, lipid metabolism, and small molecule biochemistry were reduced. DAVID and Gene Ontology analyses showed that many of the classically recognized gene pathways associated with adipose tissue dysfunction in obese adults including hypoxia, inflammation, angiogenesis were not altered in OMAT in our model. The current study indicates that obesity in juvenile Ossabaw swine is characterized by increases in overall OMAT transcript number and provides novel data describing early transcriptomic alterations that occur in response to excess caloric intake in visceral adipose tissue in a pig model of childhood obesity. PMID:24642759
Pflueger, Dorothee; Sboner, Andrea; Storz, Martina; Roth, Jasmine; Compérat, Eva; Bruder, Elisabeth; Rubin, Mark A; Schraml, Peter; Moch, Holger
2013-11-01
TFE3 translocation renal cell carcinoma (tRCC) is defined by chromosomal translocations involving the TFE3 transcription factor at chromosome Xp11.2. Genetically proven TFE3 tRCCs have a broad histologic spectrum with overlapping features to other renal tumor subtypes. In this study, we aimed for characterizing RCC with TFE3 protein expression. Using next-generation whole transcriptome sequencing (RNA-Seq) as a discovery tool, we analyzed fusion transcripts, gene expression profile, and somatic mutations in frozen tissue of one TFE3 tRCC. By applying a computational analysis developed to call chimeric RNA molecules from paired-end RNA-Seq data, we confirmed the known TFE3 translocation. Its fusion partner SFPQ has already been described as fusion partner in tRCCs. In addition, an RNA read-through chimera between TMED6 and COG8 as well as MET and KDR (VEGFR2) point mutations were identified. An EGFR mutation, but no chromosomal rearrangements, was identified in a control group of five clear cell RCCs (ccRCCs). The TFE3 tRCC could be clearly distinguished from the ccRCCs by RNA-Seq gene expression measurements using a previously reported tRCC gene signature. In validation experiments using reverse transcription-PCR, TMED6-COG8 chimera expression was significantly higher in nine TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in 24 ccRCCs (P < .001) and 22 papillary RCCs (P < .05-.07). Immunohistochemical analysis of selected genes from the tRCC gene signature showed significantly higher eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) and Contactin 3 (CNTN3) expression in 16 TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in over 200 ccRCCs (P < .0001, both).
Qi, Lei; Yue, Lei; Feng, Deqin; Qi, Fengxia; Li, Jie; Dong, Xiuzhu
2017-07-07
Unlike stable RNAs that require processing for maturation, prokaryotic cellular mRNAs generally follow an 'all-or-none' pattern. Herein, we used a 5΄ monophosphate transcript sequencing (5΄P-seq) that specifically captured the 5΄-end of processed transcripts and mapped the genome-wide RNA processing sites (PSSs) in a methanogenic archaeon. Following statistical analysis and stringent filtration, we identified 1429 PSSs, among which 23.5% and 5.4% were located in 5΄ untranslated region (uPSS) and intergenic region (iPSS), respectively. A predominant uridine downstream PSSs served as a processing signature. Remarkably, 5΄P-seq detected overrepresented uPSS and iPSS in the polycistronic operons encoding ribosomal proteins, and the majority upstream and proximal ribosome binding sites, suggesting a regulatory role of processing on translation initiation. The processed transcripts showed increased stability and translation efficiency. Particularly, processing within the tricistronic transcript of rplA-rplJ-rplL enhanced the translation of rplL, which can provide a driving force for the 1:4 stoichiometry of L10 to L12 in the ribosome. Growth-associated mRNA processing intensities were also correlated with the cellular ribosomal protein levels, thereby suggesting that mRNA processing is involved in tuning growth-dependent ribosome synthesis. In conclusion, our findings suggest that mRNA processing-mediated post-transcriptional regulation is a potential mechanism of ribosomal protein synthesis and stoichiometry. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter
2017-01-01
Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586
Spinelli, Roberta; Pirola, Alessandra; Redaelli, Sara; Sharma, Nitesh; Raman, Hima; Valletta, Simona; Magistroni, Vera; Piazza, Rocco; Gambacorti-Passerini, Carlo
2013-01-01
Point mutations in intronic regions near mRNA splice junctions can affect the splicing process. To identify novel splicing variants from exome sequencing data, we developed a bioinformatics splice-site prediction procedure to analyze next-generation sequencing (NGS) data (SpliceFinder). SpliceFinder integrates two functional annotation tools for NGS, ANNOVAR and MutationTaster and two canonical splice site prediction programs for single mutation analysis, SSPNN and NetGene2. By SpliceFinder, we identified somatic mutations affecting RNA splicing in a colon cancer sample, in eight atypical chronic myeloid leukemia (aCML), and eight CML patients. A novel homozygous splicing mutation was found in APC (NM_000038.4:c.1312+5G>A) and six heterozygous in GNAQ (NM_002072.2:c.735+1C>T), ABCC3 (NM_003786.3:c.1783-1G>A), KLHDC1 (NM_172193.1:c.568-2A>G), HOOK1 (NM_015888.4:c.1662-1G>A), SMAD9 (NM_001127217.2:c.1004-1C>T), and DNAH9 (NM_001372.3:c.10242+5G>A). Integrating whole-exome and RNA sequencing in aCML and CML, we assessed the phenotypic effect of mutations on mRNA splicing for GNAQ, ABCC3, HOOK1. In ABCC3 and HOOK1, RNA-Seq showed the presence of aberrant transcripts with activation of a cryptic splice site or intron retention, validated by the reverse transcription-polymerase chain reaction (RT-PCR) in the case of HOOK1. In GNAQ, RNA-Seq showed 22% of wild-type transcript and 78% of mRNA skipping exon 5, resulting in a 4–6 frameshift fusion confirmed by RT-PCR. The pipeline can be useful to identify intronic variants affecting RNA sequence by complementing conventional exome analysis. PMID:24498620
Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma
2017-02-22
In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .
RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.
Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu
2018-05-30
One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.
Ray, Pradipta; Torck, Andrew; Quigley, Lilyana; Wangzhou, Andi; Neiman, Matthew; Rao, Chandranshu; Lam, Tiffany; Kim, Ji-Young; Kim, Tae Hoon; Zhang, Michael Q; Dussor, Gregory; Price, Theodore J
2018-03-20
Molecular neurobiological insight into human nervous tissues is needed to generate next generation therapeutics for neurological disorders like chronic pain. We obtained human Dorsal Root Ganglia (DRG) samples from organ donors and performed RNA-sequencing (RNA-seq) to study the human DRG (hDRG) transcriptional landscape, systematically comparing it with publicly available data from a variety of human and orthologous mouse tissues, including mouse DRG (mDRG). We characterized the hDRG transcriptional profile in terms of tissue-restricted gene co-expression patterns and putative transcriptional regulators, and formulated an information-theoretic framework to quantify DRG enrichment. Relevant gene families and pathways were also analyzed, including transcription factors (TFs), g-protein coupled receptors (GCPRs) and ion channels. Our analyses reveal a hDRG-enriched protein-coding gene set (∼140), some of which have not been described in the context of DRG or pain signaling. A majority of these show conserved enrichment in mDRG, and were mined for known drug - gene product interactions. Conserved enrichment of the vast majority of TFs suggest that the mDRG is a faithful model system for studying hDRGs, due to evolutionarily conserved regulatory programs. Comparison of hDRG and tibial nerve transcriptomes suggest trafficking of neuronal mRNA to axons in adult hDRG, and are consistent with studies of axonal transport in rodent sensory neurons. We present our work as an online, searchable repository (https://www.utdallas.edu/bbs/painneurosciencelab/sensoryomics/drgtxome), creating a resource for the community. Our analyses provide insight into DRG biology for guiding development of novel therapeutics, and a blueprint for cross-species transcriptomic analyses.
Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun
2015-01-01
Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.
Goldie, Belinda J; Fitzsimmons, Chantel; Weidenhofer, Judith; Atkins, Joshua R; Wang, Dan O; Cairns, Murray J
2017-01-01
While the cytoplasmic function of microRNA (miRNA) as post-transcriptional regulators of mRNA has been the subject of significant research effort, their activity in the nucleus is less well characterized. Here we use a human neuronal cell model to show that some mature miRNA are preferentially enriched in the nucleus. These molecules were predominantly primate-specific and contained a sequence motif with homology to the consensus MAZ transcription factor binding element. Precursor miRNA containing this motif were shown to have affinity for MAZ protein in nuclear extract. We then used Ago1/2 RIP-Seq to explore nuclear miRNA-associated mRNA targets. Interestingly, the genes for Ago2-associated transcripts were also significantly enriched with MAZ binding sites and neural function, whereas Ago1-transcripts were associated with general metabolic processes and localized with SC35 spliceosomes. These findings suggest the MAZ transcription factor is associated with miRNA in the nucleus and may influence the regulation of neuronal development through Ago2-associated miRNA induced silencing complexes. The MAZ transcription factor may therefore be important for organizing higher order integration of transcriptional and post-transcriptional processes in primate neurons.
ExpEdit: a webserver to explore human RNA editing in RNA-Seq experiments.
Picardi, Ernesto; D'Antonio, Mattia; Carrabino, Danilo; Castrignanò, Tiziana; Pesole, Graziano
2011-05-01
ExpEdit is a web application for assessing RNA editing in human at known or user-specified sites supported by transcript data obtained by RNA-Seq experiments. Mapping data (in SAM/BAM format) or directly sequence reads [in FASTQ/short read archive (SRA) format] can be provided as input to carry out a comparative analysis against a large collection of known editing sites collected in DARNED database as well as other user-provided potentially edited positions. Results are shown as dynamic tables containing University of California, Santa Cruz (UCSC) links for a quick examination of the genomic context. ExpEdit is freely available on the web at http://www.caspur.it/ExpEdit/.
Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas
2018-01-01
Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270
Valenzuela-Miranda, Diego; Nuñez-Acuña, Gustavo; Valenzuela-Muñoz, Valentina; Asgari, Sassan; Gallardo-Escárate, Cristian
2015-01-25
Despite the increasing evidence of the importance of microRNAs (miRNAs) in the regulation of multiple biological processes, the molecular bases supporting this regulation are still barely understood in crustaceans. Therefore, the molecular characterization and transcriptome modulation of the miRNA biogenesis pathway were evaluated in the salmon louse Caligus rogercresseyi, an ectoparasite that constitutes one of the biggest concerns for salmonid aquaculture industry. Hence, RNA-Seq analysis was conducted from six different developmental stages, and also after bioassays with delousing drugs Deltamethrin and Azamethiphos using adult individuals. In silico analysis evidenced 24 putative genes involved in the miRNA pathway such as biogenesis, transport, maturation and miRNA-target interaction. Moreover, 243 putative single nucleotide polymorphisms (SNPs) were identified, 15 of which showed non-synonym mutations. RNA-Seq analysis revealed that CCR4-Not complex subunit 3 (CNOT3) was upregulated at earlier developmental stages (nauplius I-II and copepodid), and also after the exposure to Azamethiphos, but not to Deltamethrin. In contrast, the subunit 7 (CNOT7) showed an inverse expression pattern. Different Argonaute transcripts were associated to chalimus and adult stages, revealing specific expression patterns in response to antiparasitic drugs. Our results suggest novel insights into the regulatory network of the post-transcriptional gene regulation in C. rogercresseyi mediated by miRNAs, evidencing a putative role during the ontogeny and drug response. Copyright © 2014 Elsevier B.V. All rights reserved.
FEELnc: a tool for long non-coding RNA annotation and its application to the dog transcriptome.
Wucher, Valentin; Legeai, Fabrice; Hédan, Benoît; Rizk, Guillaume; Lagoutte, Lætitia; Leeb, Tosso; Jagannathan, Vidhya; Cadieu, Edouard; David, Audrey; Lohi, Hannes; Cirera, Susanna; Fredholm, Merete; Botherel, Nadine; Leegwater, Peter A J; Le Béguec, Céline; Fieten, Hille; Johnson, Jeremy; Alföldi, Jessica; André, Catherine; Lindblad-Toh, Kerstin; Hitte, Christophe; Derrien, Thomas
2017-05-05
Whole transcriptome sequencing (RNA-seq) has become a standard for cataloguing and monitoring RNA populations. One of the main bottlenecks, however, is to correctly identify the different classes of RNAs among the plethora of reconstructed transcripts, particularly those that will be translated (mRNAs) from the class of long non-coding RNAs (lncRNAs). Here, we present FEELnc (FlExible Extraction of LncRNAs), an alignment-free program that accurately annotates lncRNAs based on a Random Forest model trained with general features such as multi k-mer frequencies and relaxed open reading frames. Benchmarking versus five state-of-the-art tools shows that FEELnc achieves similar or better classification performance on GENCODE and NONCODE data sets. The program also provides specific modules that enable the user to fine-tune classification accuracy, to formalize the annotation of lncRNA classes and to identify lncRNAs even in the absence of a training set of non-coding RNAs. We used FEELnc on a real data set comprising 20 canine RNA-seq samples produced by the European LUPA consortium to substantially expand the canine genome annotation to include 10 374 novel lncRNAs and 58 640 mRNA transcripts. FEELnc moves beyond conventional coding potential classifiers by providing a standardized and complete solution for annotating lncRNAs and is freely available at https://github.com/tderrien/FEELnc. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
SLAM-seq defines direct gene-regulatory functions of the BRD4-MYC axis.
Muhar, Matthias; Ebert, Anja; Neumann, Tobias; Umkehrer, Christian; Jude, Julian; Wieshofer, Corinna; Rescheneder, Philipp; Lipp, Jesse J; Herzog, Veronika A; Reichholf, Brian; Cisneros, David A; Hoffmann, Thomas; Schlapansky, Moritz F; Bhat, Pooja; von Haeseler, Arndt; Köcher, Thomas; Obenauf, Anna C; Popow, Johannes; Ameres, Stefan L; Zuber, Johannes
2018-05-18
Defining direct targets of transcription factors and regulatory pathways is key to understanding their roles in physiology and disease. We combined SLAM-seq [thiol(SH)-linked alkylation for the metabolic sequencing of RNA], a method for direct quantification of newly synthesized messenger RNAs (mRNAs), with pharmacological and chemical-genetic perturbation in order to define regulatory functions of two transcriptional hubs in cancer, BRD4 and MYC, and to interrogate direct responses to BET bromodomain inhibitors (BETis). We found that BRD4 acts as general coactivator of RNA polymerase II-dependent transcription, which is broadly repressed upon high-dose BETi treatment. At doses triggering selective effects in leukemia, BETis deregulate a small set of hypersensitive targets including MYC. In contrast to BRD4, MYC primarily acts as a selective transcriptional activator controlling metabolic processes such as ribosome biogenesis and de novo purine synthesis. Our study establishes a simple and scalable strategy to identify direct transcriptional targets of any gene or pathway. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
RNA Polymerase II Regulates Topoisomerase 1 Activity to Favor Efficient Transcription.
Baranello, Laura; Wojtowicz, Damian; Cui, Kairong; Devaiah, Ballachanda N; Chung, Hye-Jung; Chan-Salis, Ka Yim; Guha, Rajarshi; Wilson, Kelli; Zhang, Xiaohu; Zhang, Hongliang; Piotrowski, Jason; Thomas, Craig J; Singer, Dinah S; Pugh, B Franklin; Pommier, Yves; Przytycka, Teresa M; Kouzine, Fedor; Lewis, Brian A; Zhao, Keji; Levens, David
2016-04-07
We report a mechanism through which the transcription machinery directly controls topoisomerase 1 (TOP1) activity to adjust DNA topology throughout the transcription cycle. By comparing TOP1 occupancy using chromatin immunoprecipitation sequencing (ChIP-seq) versus TOP1 activity using topoisomerase 1 sequencing (TOP1-seq), a method reported here to map catalytically engaged TOP1, TOP1 bound at promoters was discovered to become fully active only after pause-release. This transition coupled the phosphorylation of the carboxyl-terminal-domain (CTD) of RNA polymerase II (RNAPII) with stimulation of TOP1 above its basal rate, enhancing its processivity. TOP1 stimulation is strongly dependent on the kinase activity of BRD4, a protein that phosphorylates Ser2-CTD and regulates RNAPII pause-release. Thus the coordinated action of BRD4 and TOP1 overcame the torsional stress opposing transcription as RNAPII commenced elongation but preserved negative supercoiling that assists promoter melting at start sites. This nexus between transcription and DNA topology promises to elicit new strategies to intercept pathological gene expression. Copyright © 2016 Elsevier Inc. All rights reserved.
HALO--a Java framework for precise transcript half-life determination.
Friedel, Caroline C; Kaufmann, Stefanie; Dölken, Lars; Zimmer, Ralf
2010-05-01
Recent improvements in experimental technologies now allow measurements of de novo transcription and/or RNA decay at whole transcriptome level and determination of precise transcript half-lives. Such transcript half-lives provide important insights into the regulation of biological processes and the relative contributions of RNA decay and de novo transcription to differential gene expression. In this article, we present HALO (Half-life Organizer), the first software for the precise determination of transcript half-lives from measurements of RNA de novo transcription or decay determined with microarrays or RNA-seq. In addition, methods for quality control, filtering and normalization are supplied. HALO provides a graphical user interface, command-line tools and a well-documented Java application programming interface (API). Thus, it can be used both by biologists to determine transcript half-lives fast and reliably with the provided user interfaces as well as software developers integrating transcript half-life analysis into other gene expression profiling pipelines. Source code, executables and documentation are available at http://www.bio.ifi.lmu.de/software/halo.
Inferring Molecular Processes Heterogeneity from Transcriptional Data.
Gogolewski, Krzysztof; Wronowska, Weronika; Lech, Agnieszka; Lesyng, Bogdan; Gambin, Anna
2017-01-01
RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs.
Inferring Molecular Processes Heterogeneity from Transcriptional Data
Wronowska, Weronika; Lesyng, Bogdan; Gambin, Anna
2017-01-01
RNA microarrays and RNA-seq are nowadays standard technologies to study the transcriptional activity of cells. Most studies focus on tracking transcriptional changes caused by specific experimental conditions. Information referring to genes up- and downregulation is evaluated analyzing the behaviour of relatively large population of cells by averaging its properties. However, even assuming perfect sample homogeneity, different subpopulations of cells can exhibit diverse transcriptomic profiles, as they may follow different regulatory/signaling pathways. The purpose of this study is to provide a novel methodological scheme to account for possible internal, functional heterogeneity in homogeneous cell lines, including cancer ones. We propose a novel computational method to infer the proportion between subpopulations of cells that manifest various functional behaviour in a given sample. Our method was validated using two datasets from RNA microarray experiments. Both experiments aimed to examine cell viability in specific experimental conditions. The presented methodology can be easily extended to RNA-seq data as well as other molecular processes. Moreover, it complements standard tools to indicate most important networks from transcriptomic data and in particular could be useful in the analysis of cancer cell lines affected by biologically active compounds or drugs. PMID:29362714
Hong, Jungeui; Gresham, David
2017-11-01
Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Detection of PIWI and piRNAs in the mitochondria of mammalian cancer cells
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, ChangHyuk, E-mail: netbuyer@hanmail.net; Tak, Hyosun, E-mail: chuberry@naver.com; Rho, Mina, E-mail: minarho@hanyang.ac.kr
2014-03-28
Highlights: • piRNA sequences were mapped to human mitochondrial (mt) genome. • We inspected small RNA-Seq datasets from somatic cell mt subcellular fractions. • Piwi and piRNA transcripts are present in mammalian somatic cancer cell mt fractions. - Abstract: Piwi-interacting RNAs (piRNAs) are 26–31 nt small noncoding RNAs that are processed from their longer precursor transcripts by Piwi proteins. Localization of Piwi and piRNA has been reported mostly in nucleus and cytoplasm of higher eukaryotes germ-line cells, where it is believed that known piRNA sequences are located in repeat regions of nuclear genome in germ-line cells. However, localization of PIWImore » and piRNA in mammalian somatic cell mitochondria yet remains largely unknown. We identified 29 piRNA sequence alignments from various regions of the human mitochondrial genome. Twelve out 29 piRNA sequences matched stem-loop fragment sequences of seven distinct tRNAs. We observed their actual expression in mitochondria subcellular fractions by inspecting mitochondrial-specific small RNA-Seq datasets. Of interest, the majority of the 29 piRNAs overlapped with multiple longer transcripts (expressed sequence tags) that are unique to the human mitochondrial genome. The presence of mature piRNAs in mitochondria was detected by qRT-PCR of mitochondrial subcellular RNAs. Further validation showed detection of Piwi by colocalization using anti-Piwil1 and mitochondria organelle-specific protein antibodies.« less
Abdelrahman, Mostafa; El-Sayed, Magdi; Sato, Shusei; Hirakawa, Hideki; Ito, Shin-ichi; Tanaka, Keisuke; Mine, Yoko; Sugiyama, Nobuo; Suzuki, Minoru; Yamauchi, Naoki
2017-01-01
The genus Allium is a rich source of steroidal saponins, and its medicinal properties have been attributed to these bioactive compounds. The saponin compounds with diverse structures play a pivotal role in Allium’s defense mechanism. Despite numerous studies on the occurrence and chemical structure of steroidal saponins, their biosynthetic pathway in Allium species is poorly understood. The monosomic addition lines (MALs) of the Japanese bunching onion (A. fistulosum, FF) with an extra chromosome from the shallot (A. cepa Aggregatum group, AA) are powerful genetic resources that enable us to understand many physiological traits of Allium. In the present study, we were able to isolate and identify Alliospiroside A saponin compound in A. fistulosum with extra chromosome 2A from shallot (FF2A) and its role in the defense mechanism against Fusarium pathogens. Furthermore, to gain molecular insight into the Allium saponin biosynthesis pathway, high-throughput RNA-Seq of the root, bulb, and leaf of AA, MALs, and FF was carried out using Illumina's HiSeq 2500 platform. An open access Allium Transcript Database (Allium TDB, http://alliumtdb.kazusa.or.jp) was generated based on RNA-Seq data. The resulting assembled transcripts were functionally annotated, revealing 50 unigenes involved in saponin biosynthesis. Differential gene expression (DGE) analyses of AA and MALs as compared with FF (as a control) revealed a strong up-regulation of the saponin downstream pathway, including cytochrome P450, glycosyltransferase, and beta-glucosidase in chromosome 2A. An understanding of the saponin compounds and biosynthesis-related genes would facilitate the development of plants with unique saponin content and, subsequently, improved disease resistance. PMID:28800607
Abdelrahman, Mostafa; El-Sayed, Magdi; Sato, Shusei; Hirakawa, Hideki; Ito, Shin-Ichi; Tanaka, Keisuke; Mine, Yoko; Sugiyama, Nobuo; Suzuki, Yutaka; Yamauchi, Naoki; Shigyo, Masayoshi
2017-01-01
The genus Allium is a rich source of steroidal saponins, and its medicinal properties have been attributed to these bioactive compounds. The saponin compounds with diverse structures play a pivotal role in Allium's defense mechanism. Despite numerous studies on the occurrence and chemical structure of steroidal saponins, their biosynthetic pathway in Allium species is poorly understood. The monosomic addition lines (MALs) of the Japanese bunching onion (A. fistulosum, FF) with an extra chromosome from the shallot (A. cepa Aggregatum group, AA) are powerful genetic resources that enable us to understand many physiological traits of Allium. In the present study, we were able to isolate and identify Alliospiroside A saponin compound in A. fistulosum with extra chromosome 2A from shallot (FF2A) and its role in the defense mechanism against Fusarium pathogens. Furthermore, to gain molecular insight into the Allium saponin biosynthesis pathway, high-throughput RNA-Seq of the root, bulb, and leaf of AA, MALs, and FF was carried out using Illumina's HiSeq 2500 platform. An open access Allium Transcript Database (Allium TDB, http://alliumtdb.kazusa.or.jp) was generated based on RNA-Seq data. The resulting assembled transcripts were functionally annotated, revealing 50 unigenes involved in saponin biosynthesis. Differential gene expression (DGE) analyses of AA and MALs as compared with FF (as a control) revealed a strong up-regulation of the saponin downstream pathway, including cytochrome P450, glycosyltransferase, and beta-glucosidase in chromosome 2A. An understanding of the saponin compounds and biosynthesis-related genes would facilitate the development of plants with unique saponin content and, subsequently, improved disease resistance.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”
Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".
Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schwender, Jorg; Konig, Christina; Klapperstuck, Matthias
An attempt has been made to define the extent to which metabolic flux in central plant metabolism is reflected by changes in the transcriptome and metabolome, based on an analysis of in vitro cultured immature embryos of two oilseed rape (Brassica napus) accessions which contrast for seed lipid accumulation. Metabolic flux analysis (MFA) was used to constrain a flux balance metabolic model which included 671 biochemical and transport reactions within the central metabolism. This highly confident flux information was eventually used for comparative analysis of flux vs. transcript (metabolite). Metabolite profiling succeeded in identifying 79 intermediates within the central metabolism,more » some of which differed quantitatively between the two accessions and displayed a significant shift corresponding to flux. An RNA-Seq based transcriptome analysis revealed a large number of genes which were differentially transcribed in the two accessions, including some enzymes/proteins active in major metabolic pathways. With a few exceptions, differential activity in the major pathways (glycolysis, TCA cycle, amino acid, and fatty acid synthesis) was not reflected in contrasting abundances of the relevant transcripts. The conclusion was that transcript abundance on its own cannot be used to infer metabolic activity/fluxes in central plant metabolism. Lastly, this limitation needs to be borne in mind in evaluating transcriptome data and designing metabolic engineering experiments.« less
Transcriptome analysis of hexaploid hulless oat in response to salinity stress
Wu, Bin; Hu, Yani; Huo, Pengjie; Zhang, Qian; Chen, Xin; Zhang, Zongwen
2017-01-01
Background Oat is a cereal crop of global importance used for food, feed, and forage. Understanding salinity stress tolerance mechanisms in plants is an important step towards generating crop varieties that can cope with environmental stresses. To date, little is known about the salt tolerance of oat at the molecular level. To better understand the molecular mechanisms underlying salt tolerance in oat, we investigated the transcriptomes of control and salt-treated oat using RNA-Seq. Results Using Illumina HiSeq 4000 platform, we generated 72,291,032 and 356,891,432 reads from non-stressed control and salt-stressed oat, respectively. Assembly of 64 Gb raw sequence data yielded 128,414 putative unique transcripts with an average length of 1,189 bp. Analysis of the assembled unigenes from the salt stressed and control libraries indicated that about 65,000 unigenes were differentially expressed at different stages. Functional annotation showed that ABC transporters, plant hormone signal transduction, plant-pathogen interactions, starch and sucrose metabolism, arginine and proline metabolism, and other secondary metabolite pathways were enriched under salt stress. Based on the RPKM values of assembled unigenes, 24 differentially expressed genes under salt stress were selected for quantitative RT-PCR validation, which successfully confirmed the results of RNA-Seq. Furthermore, we identified 18,039 simple sequence repeats, which may help further elucidate salt tolerance mechanisms in oat. Conclusions Our global survey of transcriptome profiles of oat plants in response to salt stress provides useful insights into the molecular mechanisms underlying salt tolerance in this crop. These findings also represent a rich resource for further analysis of salt tolerance and for breeding oat with improved salt tolerance through the use of salt-related genes. PMID:28192458
Transcript abundance on its own cannot be used to infer fluxes in central metabolism
Schwender, Jorg; Konig, Christina; Klapperstuck, Matthias; ...
2014-11-28
An attempt has been made to define the extent to which metabolic flux in central plant metabolism is reflected by changes in the transcriptome and metabolome, based on an analysis of in vitro cultured immature embryos of two oilseed rape (Brassica napus) accessions which contrast for seed lipid accumulation. Metabolic flux analysis (MFA) was used to constrain a flux balance metabolic model which included 671 biochemical and transport reactions within the central metabolism. This highly confident flux information was eventually used for comparative analysis of flux vs. transcript (metabolite). Metabolite profiling succeeded in identifying 79 intermediates within the central metabolism,more » some of which differed quantitatively between the two accessions and displayed a significant shift corresponding to flux. An RNA-Seq based transcriptome analysis revealed a large number of genes which were differentially transcribed in the two accessions, including some enzymes/proteins active in major metabolic pathways. With a few exceptions, differential activity in the major pathways (glycolysis, TCA cycle, amino acid, and fatty acid synthesis) was not reflected in contrasting abundances of the relevant transcripts. The conclusion was that transcript abundance on its own cannot be used to infer metabolic activity/fluxes in central plant metabolism. Lastly, this limitation needs to be borne in mind in evaluating transcriptome data and designing metabolic engineering experiments.« less
Hoople, Gordon D; Richards, Andrew; Wu, Yan; Pisano, Albert P; Zhang, Kun
2018-03-26
The ability to amplify and sequence either DNA or RNA from small starting samples has only been achieved in the last five years. Unfortunately, the standard protocols for generating genomic or transcriptomic libraries are incompatible and researchers must choose whether to sequence DNA or RNA for a particular sample. Gel-seq solves this problem by enabling researchers to simultaneously prepare libraries for both DNA and RNA starting with 100 - 1000 cells using a simple hydrogel device. This paper presents a detailed approach for the fabrication of the device as well as the biological protocol to generate paired libraries. We designed Gel-seq so that it could be easily implemented by other researchers; many genetics labs already have the necessary equipment to reproduce the Gel-seq device fabrication. Our protocol employs commonly-used kits for both whole-transcript amplification (WTA) and library preparation, which are also likely to be familiar to researchers already versed in generating genomic and transcriptomic libraries. Our approach allows researchers to bring to bear the power of both DNA and RNA sequencing on a single sample without splitting and with negligible added cost.
MicroRNAs associated with muscle growth and fillet quality in rainbow trout
USDA-ARS?s Scientific Manuscript database
Selection for improved muscle growth and quality phenotypes requires understanding of post-transcriptional gene-regulation mechanisms. To investigate role of microRNAs in muscle post-transcriptional gene regulation, RNA-seq was used to identify differential expression in microRNAs and SNPs in microR...
USDA-ARS?s Scientific Manuscript database
StuA, first discovered in Aspergillus nidulans and a member of the APSES class of transcription factors, regulates several essential developmental stages in fungi such as virulence, sporulation and toxin production in phytopathogenic fungi. Fusarium verticillioides (Fv), a maize phytopathogen, produ...
RNA-Seq Reveals an Integrated Immune Response in Nucleated Erythrocytes
Morera, Davinia; Roher, Nerea; Ribas, Laia; Balasch, Joan Carles; Doñate, Carmen; Callol, Agnes; Boltaña, Sebastian; Roberts, Steven; Goetz, Giles; Goetz, Frederick W.; MacKenzie, Simon A.
2011-01-01
Background Throughout the primary literature and within textbooks, the erythrocyte has been tacitly accepted to have maintained a unique physiological role; namely gas transport and exchange. In non-mammalian vertebrates, nucleated erythrocytes are present in circulation throughout the life cycle and a fragmented series of observations in mammals support a potential role in non-respiratory biological processes. We hypothesised that nucleated erythrocytes could actively participate via ligand-induced transcriptional re-programming in the immune response. Methodology/Principal Findings Nucleated erythrocytes from both fish and birds express and regulate specific pattern recognition receptor (PRR) mRNAs and, thus, are capable of specific pathogen associated molecular pattern (PAMP) detection that is central to the innate immune response. In vitro challenge with diverse PAMPs led to de novo specific mRNA synthesis of both receptors and response factors including interferon-alpha (IFNα) that exhibit a stimulus-specific polysomal shift supporting active translation. RNA-Seq analysis of the PAMP (Poly (I∶C), polyinosinic∶polycytidylic acid)-erythrocyte response uncovered diverse cohorts of differentially expressed mRNA transcripts related to multiple physiological systems including the endocrine, reproductive and immune. Moreover, erythrocyte-derived conditioned mediums induced a type-1 interferon response in macrophages thus supporting an integrative role for the erythrocytes in the immune response. Conclusions/Significance We demonstrate that nucleated erythrocytes in non-mammalian vertebrates spanning significant phylogenetic distance participate in the immune response. RNA-Seq studies highlight a mRNA repertoire that suggests a previously unrecognized integrative role for the erythrocytes in other physiological systems. PMID:22046430
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei
2017-05-19
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Oh, Chun-do; Lu, Yue; Liang, Shoudan; Mori-Akiyama, Yuko; Chen, Di; de Crombrugghe, Benoit; Yasuda, Hideyo
2014-01-01
The transcription factor SOX9 plays an essential role in determining the fate of several cell types and is a master factor in regulation of chondrocyte development. Our aim was to determine which genes in the genome of chondrocytes are either directly or indirectly controlled by SOX9. We used RNA-Seq to identify genes whose expression levels were affected by SOX9 and used SOX9 ChIP-Seq to identify those genes that harbor SOX9-interaction sites. For RNA-Seq, the RNA expression profile of primary Sox9flox/flox mouse chondrocytes infected with Ad-CMV-Cre was compared with that of the same cells infected with a control adenovirus. Analysis of RNA-Seq data indicated that, when the levels of Sox9 mRNA were decreased more than 8-fold by infection with Ad-CMV-Cre, 196 genes showed a decrease in expression of at least 4-fold. These included many cartilage extracellular matrix (ECM) genes and a number of genes for ECM modification enzymes (transferases), membrane receptors, transporters, and others. In ChIP-Seq, 75% of the SOX9-interaction sites had a canonical inverted repeat motif within 100 bp of the top of the peak. SOX9-interaction sites were found in 55% of the genes whose expression was decreased more than 8-fold in SOX9-depleted cells and in somewhat fewer of the genes whose expression was reduced more than 4-fold, suggesting that these are direct targets of SOX9. The combination of RNA-Seq and ChIP-Seq has provided a fuller understanding of the SOX9-controlled genetic program of chondrocytes.
Chokeshaiusaha, Kaj; Puthier, Denis; Nguyen, Catherine; Sananmuang, Thanida
2018-06-01
Trimethylation of histone 3 (H3) at 4th lysine N-termini (H3K4me3) in gene promoter region was the universal marker of active genes specific to cell lineage. On the contrary, coexistence of trimethylation at 27th lysine (H3K27me3) in the same loci-the bivalent H3K4m3/H3K27me3 was known to suspend the gene transcription in germ cells, and could also be inherited to the developed stem cell. In galline species, throughout example of H3K4m3 and H3K27me3 ChIP-seq analysis was still not provided. We therefore designed and demonstrated such procedures using ChIP-seq and mRNA-seq data of chicken follicular mesenchymal cells and male germ cells. Analytical workflow was designed and provided in this study. ChIP-seq and RNA-seq datasets of follicular mesenchymal cells and male germ cells were acquired and properly preprocessed. Peak calling by Model-based analysis of ChIP-seq 2 was performed to identify H3K4m3 or H3K27me3 enriched regions (Fold-change≥2, FDR≤0.01) in gene promoter regions. Integrative genomics viewer was utilized for cellular retinoic acid binding protein 1 ( CRABP1 ), growth differentiation factor 10 ( GDF10 ), and gremlin 1 ( GREM1 ) gene explorations. The acquired results indicated that follicular mesenchymal cells and germ cells shared several unique gene promoter regions enriched with H3K4me3 (5,704 peaks) and also unique regions of bivalent H3K4m3/H3K27me3 shared between all cell types and germ cells (1,909 peaks). Subsequent observation of follicular mesenchyme-specific genes- CRABP1 , GDF10 , and GREM1 correctly revealed vigorous transcriptions of these genes in follicular mesenchymal cells. As expected, bivalent H3K4m3/H3K27me3 pattern was manifested in gene promoter regions of germ cells, and thus suspended their transcriptions. According the results, an example of chicken H3K4m3/H3K27me3 ChIP-seq data analysis was successfully demonstrated in this study. Hopefully, the provided methodology should hereby be useful for galline ChIP-seq data analysis in the future.
Hoshino, Tatsuhiko; Inagaki, Fumio
2017-01-01
Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
Large-scale prediction of ADAR-mediated effective human A-to-I RNA editing.
Yao, Li; Wang, Heming; Song, Yuanyuan; Dai, Zhen; Yu, Hao; Yin, Ming; Wang, Dongxu; Yang, Xin; Wang, Jinlin; Wang, Tiedong; Cao, Nan; Zhu, Jimin; Shen, Xizhong; Song, Guangqi; Zhao, Yicheng
2017-08-10
Adenosine-to-inosine (A-to-I) editing by adenosine deaminase acting on the RNA (ADAR) proteins is one of the most frequent modifications during post- and co-transcription. To facilitate the assignment of biological functions to specific editing sites, we designed an automatic online platform to annotate A-to-I RNA editing sites in pre-mRNA splicing signals, microRNAs (miRNAs) and miRNA target untranslated regions (3' UTRs) from human (Homo sapiens) high-throughput sequencing data and predict their effects based on large-scale bioinformatic analysis. After analysing plenty of previously reported RNA editing events and human normal tissues RNA high-seq data, >60 000 potentially effective RNA editing events on functional genes were found. The RNA Editing Plus platform is available for free at https://www.rnaeditplus.org/, and we believe our platform governing multiple optimized methods will improve further studies of A-to-I-induced editing post-transcriptional regulation. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Wallace, Andrew D.; Hodgson, Ernest; Roe, R. Michael
2017-01-01
While the synthesis and use of new chemical compounds is at an all-time high, the study of their potential impact on human health is quickly falling behind, and new methods are needed to assess their impact. We chose to examine the effects of two common environmental chemicals, the insect repellent N,N-diethyl-m-toluamide (DEET) and the insecticide fluocyanobenpyrazole (fipronil), on transcript levels of long non-protein coding RNAs (lncRNAs) in primary human hepatocytes using a global RNA-Seq approach. While lncRNAs are believed to play a critical role in numerous important biological processes, many still remain uncharacterized, and their functions and modes of action remain largely unclear, especially in relation to environmental chemicals. RNA-Seq showed that 100 µM DEET significantly increased transcript levels for 2 lncRNAs and lowered transcript levels for 18 lncRNAs, while fipronil at 10 µM increased transcript levels for 76 lncRNAs and decreased levels for 193 lncRNAs. A mixture of 100 µM DEET and 10 µM fipronil increased transcript levels for 75 lncRNAs and lowered transcript levels for 258 lncRNAs. This indicates a more-than-additive effect on lncRNA transcript expression when the two chemicals were presented in combination versus each chemical alone. Differentially expressed lncRNA genes were mapped to chromosomes, analyzed by proximity to neighboring protein-coding genes, and functionally characterized via gene ontology and molecular mapping algorithms. While further testing is required to assess the organismal impact of changes in transcript levels, this initial analysis links several of the dysregulated lncRNAs to processes and pathways critical to proper cellular function, such as the innate and adaptive immune response and the p53 signaling pathway. PMID:28991164
Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets
Macosko, Evan Z.; Basu, Anindita; Satija, Rahul; Nemesh, James; Shekhar, Karthik; Goldman, Melissa; Tirosh, Itay; Bialas, Allison R.; Kamitaki, Nolan; Martersteck, Emily M.; Trombetta, John J.; Weitz, David A.; Sanes, Joshua R.; Shalek, Alex K.; Regev, Aviv; McCarroll, Steven A.
2015-01-01
Summary Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of ease and scale have prevented its broad application. Here we describe Drop-Seq, a strategy for quickly profiling thousands of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell’s RNAs, and sequencing them all together. Drop-Seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts’ cell of origin. We analyzed transcriptomes from 44,808 mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes. Drop-Seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution. PMID:26000488
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.
Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping
2016-08-26
Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
RNA-Seq of Arabidopsis Pollen Uncovers Novel Transcription and Alternative Splicing1[C][W][OA
Loraine, Ann E.; McCormick, Sheila; Estrada, April; Patel, Ketan; Qin, Peng
2013-01-01
Pollen grains of Arabidopsis (Arabidopsis thaliana) contain two haploid sperm cells enclosed in a haploid vegetative cell. Upon germination, the vegetative cell extrudes a pollen tube that carries the sperm to an ovule for fertilization. Knowing the identity, relative abundance, and splicing patterns of pollen transcripts will improve our understanding of pollen and allow investigation of tissue-specific splicing in plants. Most Arabidopsis pollen transcriptome studies have used the ATH1 microarray, which does not assay splice variants and lacks specific probe sets for many genes. To investigate the pollen transcriptome, we performed high-throughput sequencing (RNA-Seq) of Arabidopsis pollen and seedlings for comparison. Gene expression was more diverse in seedling, and genes involved in cell wall biogenesis were highly expressed in pollen. RNA-Seq detected at least 4,172 protein-coding genes expressed in pollen, including 289 assayed only by nonspecific probe sets. Additional exons and previously unannotated 5′ and 3′ untranslated regions for pollen-expressed genes were revealed. We detected regions in the genome not previously annotated as expressed; 14 were tested and 12 were confirmed by polymerase chain reaction. Gapped read alignments revealed 1,908 high-confidence new splicing events supported by 10 or more spliced read alignments. Alternative splicing patterns in pollen and seedling were highly correlated. For most alternatively spliced genes, the ratio of variants in pollen and seedling was similar, except for some encoding proteins involved in RNA splicing. This study highlights the robustness of splicing patterns in plants and the importance of ongoing annotation and visualization of RNA-Seq data using interactive tools such as Integrated Genome Browser. PMID:23590974
Juranic Lisnic, Vanda; Babic Cac, Marina; Lisnic, Berislav; Trsan, Tihana; Mefferd, Adam; Das Mukhopadhyay, Chitrangada; Cook, Charles H.; Jonjic, Stipan; Trgovcich, Joanne
2013-01-01
Major gaps in our knowledge of pathogen genes and how these gene products interact with host gene products to cause disease represent a major obstacle to progress in vaccine and antiviral drug development for the herpesviruses. To begin to bridge these gaps, we conducted a dual analysis of Murine Cytomegalovirus (MCMV) and host cell transcriptomes during lytic infection. We analyzed the MCMV transcriptome during lytic infection using both classical cDNA cloning and sequencing of viral transcripts and next generation sequencing of transcripts (RNA-Seq). We also investigated the host transcriptome using RNA-Seq combined with differential gene expression analysis, biological pathway analysis, and gene ontology analysis. We identify numerous novel spliced and unspliced transcripts of MCMV. Unexpectedly, the most abundantly transcribed viral genes are of unknown function. We found that the most abundant viral transcript, recently identified as a noncoding RNA regulating cellular microRNAs, also codes for a novel protein. To our knowledge, this is the first viral transcript that functions both as a noncoding RNA and an mRNA. We also report that lytic infection elicits a profound cellular response in fibroblasts. Highly upregulated and induced host genes included those involved in inflammation and immunity, but also many unexpected transcription factors and host genes related to development and differentiation. Many top downregulated and repressed genes are associated with functions whose roles in infection are obscure, including host long intergenic noncoding RNAs, antisense RNAs or small nucleolar RNAs. Correspondingly, many differentially expressed genes cluster in biological pathways that may shed new light on cytomegalovirus pathogenesis. Together, these findings provide new insights into the molecular warfare at the virus-host interface and suggest new areas of research to advance the understanding and treatment of cytomegalovirus-associated diseases. PMID:24086132
Sabeh, Michael; Duceppe, Marc-Olivier; St-Arnaud, Marc; Mimee, Benjamin
2018-01-01
Relative gene expression analyses by qRT-PCR (quantitative reverse transcription PCR) require an internal control to normalize the expression data of genes of interest and eliminate the unwanted variation introduced by sample preparation. A perfect reference gene should have a constant expression level under all the experimental conditions. However, the same few housekeeping genes selected from the literature or successfully used in previous unrelated experiments are often routinely used in new conditions without proper validation of their stability across treatments. The advent of RNA-Seq and the availability of public datasets for numerous organisms are opening the way to finding better reference genes for expression studies. Globodera rostochiensis is a plant-parasitic nematode that is particularly yield-limiting for potato. The aim of our study was to identify a reliable set of reference genes to study G. rostochiensis gene expression. Gene expression levels from an RNA-Seq database were used to identify putative reference genes and were validated with qRT-PCR analysis. Three genes, GR, PMP-3, and aaRS, were found to be very stable within the experimental conditions of this study and are proposed as reference genes for future work.
Nishtala, Sneha; Neelamraju, Yaseswini; Janga, Sarath Chandra
2016-05-10
RNA-binding proteins (RBPs) are pivotal in orchestrating several steps in the metabolism of RNA in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Here, we employed CLIP (cross-linking immunoprecipitation)-seq datasets for 60 human RBPs and RIP-ChIP (RNP immunoprecipitation-microarray) data for 69 yeast RBPs to construct a network of genome-wide RBP- target RNA interactions for each RBP. We show in humans that majority (~78%) of the RBPs are strongly associated with their target transcripts at transcript level while ~95% of the studied RBPs were also found to be strongly associated with expression levels of target transcripts when protein expression levels of RBPs were employed. At transcript level, RBP - RNA interaction data for the yeast genome, exhibited a strong association for 63% of the RBPs, confirming the association to be conserved across large phylogenetic distances. Analysis to uncover the features contributing to these associations revealed the number of target transcripts and length of the selected protein-coding transcript of an RBP at the transcript level while intensity of the CLIP signal, number of RNA-Binding domains, location of the binding site on the transcript, to be significant at the protein level. Our analysis will contribute to improved modelling and prediction of post-transcriptional networks.
NASA Astrophysics Data System (ADS)
Nishtala, Sneha; Neelamraju, Yaseswini; Janga, Sarath Chandra
2016-05-01
RNA-binding proteins (RBPs) are pivotal in orchestrating several steps in the metabolism of RNA in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Here, we employed CLIP (cross-linking immunoprecipitation)-seq datasets for 60 human RBPs and RIP-ChIP (RNP immunoprecipitation-microarray) data for 69 yeast RBPs to construct a network of genome-wide RBP- target RNA interactions for each RBP. We show in humans that majority (~78%) of the RBPs are strongly associated with their target transcripts at transcript level while ~95% of the studied RBPs were also found to be strongly associated with expression levels of target transcripts when protein expression levels of RBPs were employed. At transcript level, RBP - RNA interaction data for the yeast genome, exhibited a strong association for 63% of the RBPs, confirming the association to be conserved across large phylogenetic distances. Analysis to uncover the features contributing to these associations revealed the number of target transcripts and length of the selected protein-coding transcript of an RBP at the transcript level while intensity of the CLIP signal, number of RNA-Binding domains, location of the binding site on the transcript, to be significant at the protein level. Our analysis will contribute to improved modelling and prediction of post-transcriptional networks.
A novel small RNA S042 increases acid tolerance in Lactococcus lactis F44.
Wu, Hao; Song, Shunyi; Tian, Kairen; Zhou, Dandan; Wang, Binbin; Liu, Jiaheng; Zhu, Hongji; Qiao, Jianjun
2018-06-07
Lactococcus lactis, a gram-positive bacterium, encounters various environmental stresses, especially acid stress, during fermentation. Small RNAs (sRNAs) that serve as regulators at post-transcriptional level play important roles in acid stress response. Here, a novel sRNA S042 was identified by RNA-Seq, RT-PCR and Northern blot. The transcription level of s042 was upregulated 2.29-fold under acid stress by Quantitative RT-PCR (qRT-PCR) analysis. Acid tolerance assay showed that overexpressing s042 increased the survival rate of L. lactis F44 and deleting s042 significantly inhibited the viability under acidic conditions. Moreover, the targets were predicted by online software and four genes were chosen as candidates. Among them, argR (arginine regulator) and accD (acetyl-CoA carboxylase carboxyl transferase subunit beta) were validated to be the direct targets activated by S042 through reporter fusion assay. The regulatory mechanism between S042 and its targets was further investigated through Bioinformatics and qRT-PCR. This study served to highlight the role of the novel sRNA S042 in acid resistance of L. lactis and provided new insights into the response mechanism of acid stress. Copyright © 2018 Elsevier Inc. All rights reserved.
Influenza Virus Mounts a Two-Pronged Attack on Host RNA Polymerase II Transcription.
Bauer, David L V; Tellier, Michael; Martínez-Alonso, Mónica; Nojima, Takayuki; Proudfoot, Nick J; Murphy, Shona; Fodor, Ervin
2018-05-15
Influenza virus intimately associates with host RNA polymerase II (Pol II) and mRNA processing machinery. Here, we use mammalian native elongating transcript sequencing (mNET-seq) to examine Pol II behavior during viral infection. We show that influenza virus executes a two-pronged attack on host transcription. First, viral infection causes decreased Pol II gene occupancy downstream of transcription start sites. Second, virus-induced cellular stress leads to a catastrophic failure of Pol II termination at poly(A) sites, with transcription often continuing for tens of kilobases. Defective Pol II termination occurs independently of the ability of the viral NS1 protein to interfere with host mRNA processing. Instead, this termination defect is a common effect of diverse cellular stresses and underlies the production of previously reported downstream-of-gene transcripts (DoGs). Our work has implications for understanding not only host-virus interactions but also fundamental aspects of mammalian transcription. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.
Assessment of stem cell differentiation based on genome-wide expression profiles.
Godoy, Patricio; Schmidt-Heck, Wolfgang; Hellwig, Birte; Nell, Patrick; Feuerborn, David; Rahnenführer, Jörg; Kattler, Kathrin; Walter, Jörn; Blüthgen, Nils; Hengstler, Jan G
2018-07-05
In recent years, protocols have been established to differentiate stem and precursor cells into more mature cell types. However, progress in this field has been hampered by difficulties to assess the differentiation status of stem cell-derived cells in an unbiased manner. Here, we present an analysis pipeline based on published data and methods to quantify the degree of differentiation and to identify transcriptional control factors explaining differences from the intended target cells or tissues. The pipeline requires RNA-Seq or gene array data of the stem cell starting population, derived 'mature' cells and primary target cells or tissue. It consists of a principal component analysis to represent global expression changes and to identify possible problems of the dataset that require special attention, such as: batch effects; clustering techniques to identify gene groups with similar features; over-representation analysis to characterize biological motifs and transcriptional control factors of the identified gene clusters; and metagenes as well as gene regulatory networks for quantitative cell-type assessment and identification of influential transcription factors. Possibilities and limitations of the analysis pipeline are illustrated using the example of human embryonic stem cell and human induced pluripotent cells to generate 'hepatocyte-like cells'. The pipeline quantifies the degree of incomplete differentiation as well as remaining stemness and identifies unwanted features, such as colon- and fibroblast-associated gene clusters that are absent in real hepatocytes but typically induced by currently available differentiation protocols. Finally, transcription factors responsible for incomplete and unwanted differentiation are identified. The proposed method is widely applicable and allows an unbiased and quantitative assessment of stem cell-derived cells.This article is part of the theme issue 'Designer human tissue: coming to a lab near you'. © 2018 The Author(s).
An RNA-Seq-based reference transcriptome for Citrus.
Terol, Javier; Tadeo, Francisco; Ventimilla, Daniel; Talon, Manuel
2016-03-01
Previous RNA-Seq studies in citrus have been focused on physiological processes relevant to fruit quality and productivity of the major species, especially sweet orange. Less attention has been paid to vegetative or reproductive tissues, while most Citrus species have never been analysed. In this work, we characterized the transcriptome of vegetative and reproductive tissues from 12 Citrus species from all main phylogenetic groups. Our aims were to acquire a complete view of the citrus transcriptome landscape, to improve previous functional annotations and to obtain genetic markers associated with genes of agronomic interest. 28 samples were used for RNA-Seq analysis, obtained from 12 Citrus species: C. medica, C. aurantifolia, C. limon, C. bergamia, C. clementina, C. deliciosa, C. reshni, C. maxima, C. paradisi, C. aurantium, C. sinensis and Poncirus trifoliata. Four different organs were analysed: root, phloem, leaf and flower. A total of 3421 million Illumina reads were produced and mapped against the reference C. clementina genome sequence. Transcript discovery pipeline revealed 3326 new genes, the number of genes with alternative splicing was increased to 19,739, and a total of 73,797 transcripts were identified. Differential expression studies between the four tissues showed that gene expression is overall related to the physiological function of the specific organs above any other variable. Variants discovery analysis revealed the presence of indels and SNPs in genes associated with fruit quality and productivity. Pivotal pathways in citrus such as those of flavonoids, flavonols, ethylene and auxin were also analysed in detail. © 2015 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Ma, Deyou; Yang, Hongsheng; Sun, Lina; Chen, Muyan
2014-01-01
Sea cucumbers Apostichopus japonicus are one of the most important aquaculture species in China. Their normal body color is black to fit their surroundings. Wild albinos are rare and hard to breed. To understand the differences between albino and normal (control) sea cucumbers at the transcriptional level, we sequenced the transcriptomes in their body-wall tissues using RNA-Seq high-throughput sequencing. Approximately 4.876 million (M) and 4.884 M 200-nucleotide-long cDNA reads were produced in the cDNA libraries derived from the body walls of albino and control samples, respectively. A total of 9 561 (46.89%) putative genes were identified from among the RNA-Seq reads in both libraries. After filtering, 837 significantly differentially regulated genes were identified in the albino library compared with in the control library, and 3.6% of the differentially expressed genes (DEGs) were found to have changed those more than five-fold. The expression levels of 10 DEGs were checked by real-time PCR and the results were in full accord with the RNA-Seq expression trends, although the amplitude of the differences in expression levels was lower in all cases. A series of pathways were significantly enriched for the DEGs. These pathways were closely related to phagocytosis, the complement and coagulation cascades, apoptosis-related diseases, cytokine-cytokine receptor interaction, and cell adhesion. The differences in gene expression and enriched pathways between the albino and control sea cucumbers offer control targets for cultivating excellent albino A. japonicus strains in the future.
Enyeart, Peter J; Mohr, Georg; Ellington, Andrew D; Lambowitz, Alan M
2014-01-13
Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into 'targetrons.' Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and 'cut-and-pastes' (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.
Transcriptional profile of sweet orange in response to chitosan and salicylic acid.
Coqueiro, Danila Souza Oliveira; de Souza, Alessandra Alves; Takita, Marco Aurélio; Rodrigues, Carolina Munari; Kishi, Luciano Takeshi; Machado, Marcos Antonio
2015-04-12
Resistance inducers have been used in annual crops as an alternative for disease control. Wood perennial fruit trees, such as those of the citrus species, are candidates for treatment with resistance inducers, such as salicylic acid (SA) and chitosan (CHI). However, the involved mechanisms in resistance induced by elicitors in citrus are currently few known. In the present manuscript, we report information regarding the transcriptional changes observed in sweet orange in response to exogenous applications of SA and CHI using RNA-seq technology. More genes were induced by SA treatment than by CHI treatment. In total, 1,425 differentially expressed genes (DEGs) were identified following treatment with SA, including the important genes WRKY50, PR2, and PR9, which are known to participate in the salicylic acid signaling pathway, and genes involved in ethylene/Jasmonic acid biosynthesis (ACS12, AP2 domain-containing transcription factor, and OPR3). In addition, SA treatment promoted the induction of a subset of genes involved in several metabolic processes, such as redox states and secondary metabolism, which are associated with biotic stress. For CHI treatment, there were 640 DEGs, many of them involved in secondary metabolism. For both SA and CHI treatments, the auxin pathway genes were repressed, but SA treatment promoted induction in the ethylene and jasmonate acid pathway genes, in addition to repressing the abscisic acid pathway genes. Chitosan treatment altered some hormone metabolism pathways. The DEGs were validated by quantitative Real-Time PCR (qRT-PCR), and the results were consistent with the RNA-seq data, with a high correlation between the two analyses. We expanded the available information regarding induced defense by elicitors in a species of Citrus that is susceptible to various diseases and identified the molecular mechanisms by which this defense might be mediated.
Zong, Shan; Deng, Shuyun; Chen, Kenian; Wu, Jia Qian
2014-11-11
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.
Chen, Kenian; Wu, Jia Qian
2014-01-01
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo. PMID:25407807
Todd, Shawn; Boyd, Victoria; Tachedjian, Mary; Klein, Reuben; Shiell, Brian; Dearnley, Megan; McAuley, Alexander J.; Woon, Amanda P.; Purcell, Anthony W.; Marsh, Glenn A.; Baker, Michelle L.
2017-01-01
ABSTRACT Ebolavirus and Marburgvirus comprise two genera of negative-sense single-stranded RNA viruses that cause severe hemorrhagic fevers in humans. Despite considerable research efforts, the molecular events following Ebola virus (EBOV) infection are poorly understood. With the view of identifying host factors that underpin EBOV pathogenesis, we compared the transcriptomes of EBOV-infected human, pig, and bat kidney cells using a transcriptome sequencing (RNA-seq) approach. Despite a significant difference in viral transcription/replication between the cell lines, all cells responded to EBOV infection through a robust induction of extracellular growth factors. Furthermore, a significant upregulation of activator protein 1 (AP1) transcription factor complex members FOS and JUN was observed in permissive cell lines. Functional studies focusing on human cells showed that EBOV infection induces protein expression, phosphorylation, and nuclear accumulation of JUN and, to a lesser degree, FOS. Using a luciferase-based reporter, we show that EBOV infection induces AP1 transactivation activity within human cells at 48 and 72 h postinfection. Finally, we show that JUN knockdown decreases the expression of EBOV-induced host gene expression. Taken together, our study highlights the role of AP1 in promoting the host gene expression profile that defines EBOV pathogenesis. IMPORTANCE Many questions remain about the molecular events that underpin filovirus pathophysiology. The rational design of new intervention strategies, such as postexposure therapeutics, will be significantly enhanced through an in-depth understanding of these molecular events. We believe that new insights into the molecular pathogenesis of EBOV may be possible by examining the transcriptomic response of taxonomically diverse cell lines (derived from human, pig, and bat). We first identified the responsive pathways using an RNA-seq-based transcriptomics approach. Further functional and computational analysis focusing on human cells highlighted an important role for the AP1 transcription factor in mediating the transcriptional response to EBOV infection. Our study sheds new light on how host transcription factors respond to and promote the transcriptional landscape that follows viral infection. PMID:28931675
In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features.
Ding, Yiliang; Tang, Yin; Kwok, Chun Kit; Zhang, Yu; Bevilacqua, Philip C; Assmann, Sarah M
2014-01-30
RNA structure has critical roles in processes ranging from ligand sensing to the regulation of translation, polyadenylation and splicing. However, a lack of genome-wide in vivo RNA structural data has limited our understanding of how RNA structure regulates gene expression in living cells. Here we present a high-throughput, genome-wide in vivo RNA structure probing method, structure-seq, in which dimethyl sulphate methylation of unprotected adenines and cytosines is identified by next-generation sequencing. Application of this method to Arabidopsis thaliana seedlings yielded the first in vivo genome-wide RNA structure map at nucleotide resolution for any organism, with quantitative structural information across more than 10,000 transcripts. Our analysis reveals a three-nucleotide periodic repeat pattern in the structure of coding regions, as well as a less-structured region immediately upstream of the start codon, and shows that these features are strongly correlated with translation efficiency. We also find patterns of strong and weak secondary structure at sites of alternative polyadenylation, as well as strong secondary structure at 5' splice sites that correlates with unspliced events. Notably, in vivo structures of messenger RNAs annotated for stress responses are poorly predicted in silico, whereas mRNA structures of genes related to cell function maintenance are well predicted. Global comparison of several structural features between these two categories shows that the mRNAs associated with stress responses tend to have more single-strandedness, longer maximal loop length and higher free energy per nucleotide, features that may allow these RNAs to undergo conformational changes in response to environmental conditions. Structure-seq allows the RNA structurome and its biological roles to be interrogated on a genome-wide scale and should be applicable to any organism.
Huang, Yanhua; Cui, Xin; Cen, Huifang; Wang, Kehua; Zhang, Yunwei
2018-04-10
Intracellular Na + (K + )/H + antiporters (NHXs) have pivotal functions in regulating plant growth, development, and resistance to a range of stresses. To gain insight into the molecular events underlying their actions in switchgrass (Panicum virgatum L.), we analyzed transcriptomic changes between PvNHX1-overexpression transgenic lines and wild-type (WT) plants using RNA sequencing (RNA-seq) technology. The comparison of transcriptomic data from the WT and transgenic plants revealed a large number of differentially expressed genes (DEGs) in the latter. Gene ontology (GO) and KEGG pathway analyses showed that these DEGs were associated with a wide range of functions, and participated in many biological processes. For example, we found that PvNHX1 had an important role in plant growth through its regulation of photosynthetic activity and cell expansion. In addition, PvNHX1 regulated K + homeostasis, cell expansion and pollen development, indicating that it has unique and specific roles in flower development. We also found that transgenic switchgrass exhibited a higher level of transcription of defense-related genes, especially those involved in disease resistance. We showed that PvNHX1 had an important role in plant growth and development through its regulation of photosynthetic activity, cell expansion, K + homeostasis, and pollen development. Additionally, PvNHX1 overexpression activated a complex signal transduction network in response to various biotic and abiotic stresses. In relation to plant growth, development, and defense responses, PvNHX1 also had a vital regulatory role in the formation of a series of plant hormones and transcription factors (TFs). The reliability of the RNA-seq data was confirmed by quantitative real-time PCR. Our data provide a valuable foundation for further research into the molecular mechanisms and physiological roles of NHXs in plants.
Tani, Hidenori; Imamachi, Naoto; Salam, Kazi Abdus; Mizutani, Rena; Ijiri, Kenichi; Irie, Takuma; Yada, Tetsushi; Suzuki, Yutaka; Akimitsu, Nobuyoshi
2012-01-01
UPF1 eliminates aberrant mRNAs harboring premature termination codons, and regulates the steady-state levels of normal physiological mRNAs. Although genome-wide studies of UPF1 targets performed, previous studies did not distinguish indirect UPF1 targets because they could not determine UPF1-dependent altered RNA stabilities. Here, we measured the decay rates of the whole transcriptome in UPF1-depleted HeLa cells using BRIC-seq, an inhibitor-free method for directly measuring RNA stability. We determined the half-lives and expression levels of 9,229 transcripts. An amount of 785 transcripts were stabilized in UPF1-depleted cells. Among these, the expression levels of 76 transcripts were increased, but those of the other 709 transcripts were not altered. RNA immunoprecipitation showed UPF1 bound to the stabilized transcripts, suggesting that UPF1 directly degrades the 709 transcripts. Many UPF1 targets in this study were newly identified. This study clearly demonstrates that direct determination of RNA stability is a powerful approach for identifying targets of RNA degradation factors. PMID:23064114
Jones, Christopher J.; Newsom, David; Kelly, Benjamin; Irie, Yasuhiko; Jennings, Laura K.; Xu, Binjie; Limoli, Dominique H.; Harrison, Joe J.; Parsek, Matthew R.; White, Peter; Wozniak, Daniel J.
2014-01-01
The transcription factor AmrZ regulates genes important for P. aeruginosa virulence, including type IV pili, extracellular polysaccharides, and the flagellum; however, the global effect of AmrZ on gene expression remains unknown, and therefore, AmrZ may directly regulate many additional genes that are crucial for infection. Compared to the wild type strain, a ΔamrZ mutant exhibits a rugose colony phenotype, which is commonly observed in variants that accumulate the intracellular second messenger cyclic diguanylate (c-di-GMP). Cyclic di-GMP is produced by diguanylate cyclases (DGC) and degraded by phosphodiesterases (PDE). We hypothesized that AmrZ limits the intracellular accumulation of c-di-GMP through transcriptional repression of gene(s) encoding a DGC. In support of this, we observed elevated c-di-GMP in the ΔamrZ mutant compared to the wild type strain. Consistent with other strains that accumulate c-di-GMP, when grown as a biofilm, the ΔamrZ mutant formed larger microcolonies than the wild-type strain. This enhanced biofilm formation was abrogated by expression of a PDE. To identify potential target DGCs, a ChIP-Seq was performed and identified regions of the genome that are bound by AmrZ. RNA-Seq experiments revealed the entire AmrZ regulon, and characterized AmrZ as an activator or repressor at each binding site. We identified an AmrZ-repressed DGC-encoding gene (PA4843) from this cohort, which we named AmrZ dependent cyclase A (adcA). PAO1 overexpressing adcA accumulates 29-fold more c-di-GMP than the wild type strain, confirming the cyclase activity of AdcA. In biofilm reactors, a ΔamrZ ΔadcA double mutant formed smaller microcolonies than the single ΔamrZ mutant, indicating adcA is responsible for the hyper biofilm phenotype of the ΔamrZ mutant. This study combined the techniques of ChIP-Seq and RNA-Seq to define the comprehensive regulon of a bifunctional transcriptional regulator. Moreover, we identified a c-di-GMP mediated mechanism for AmrZ regulation of biofilm formation and chronicity. PMID:24603766
Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv
2010-01-01
RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick
2018-01-04
ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wang, Jiawei; Cao, Li; Yang, Xiaowen; Wu, Qingmin; Lu, Lin; Wang, Zhen
2018-05-07
The objective of this study was to comprehensively identify the target genes regulated by the RNA polymerase-binding transcription factor DksA in Escherichia coli, and to clarify the role of DksA in multi-drug resistance. A clinical E. coli strain, E8, was selected to construct the dksA gene deletion mutant by using the Red recombination system. The minimum inhibitory concentrations (MICs) of 12 antibiotics in the E8ΔdksA (mutant) were markedly lower than those in the wild-type strain, E8. Genes differentially expressed in the wild-type and dksA mutant were detected using RNA-Seq and were validated by performing quantitative real-time PCR (qRT-PCR). In total, 168 differentially expressed genes were identified in E8ΔdksA, including 81 up-regulated and 87 down-regulated genes. Many of the genes identified are involved in metabolism, two-component systems, transcriptional regulators, and transport/membrane proteins. Interestingly, genes encoding the transcriptional regulator, MarR, which is known to repress the multiple drug resistance operon, marRAB; MdfA, a transport protein that exhibits multidrug efflux activities; oligopeptide transport system proteins OppA and OppD were among those differentially expressed, and could potentially contribute to the increased drug susceptibility of E8ΔdksA. In conclusion, DksA plays an important role in the multi-drug resistance of this E. coli strain, and directly or indirectly regulates the expression of several genes related to antibiotic resistance. Copyright © 2018. Published by Elsevier B.V.
Cantu, Dario; Pearce, Stephen P; Distelfeld, Assaf; Christiansen, Michael W; Uauy, Cristobal; Akhunov, Eduard; Fahima, Tzion; Dubcovsky, Jorge
2011-10-07
Increasing the nutrient concentration of wheat grains is important to ameliorate nutritional deficiencies in many parts of the world. Proteins and nutrients in the wheat grain are largely derived from the remobilization of degraded leaf molecules during monocarpic senescence. The down-regulation of the NAC transcription factor Grain Protein Content (GPC) in transgenic wheat plants delays senescence (>3 weeks) and reduces the concentration of protein, Zn and Fe in the grain (>30%), linking senescence and nutrient remobilization.Based on the early and rapid up-regulation of GPC in wheat flag leaves after anthesis, we hypothesized that this transcription factor is an early regulator of monocarpic senescence. To test this hypothesis, we used high-throughput mRNA-seq technologies to characterize the effect of the GPC down-regulation on the wheat flag-leaf transcriptome 12 days after anthesis. At this early stage of senescence GPC transcript levels are significantly lower in transgenic GPC-RNAi plants than in the wild type, but there are still no visible phenotypic differences between genotypes. We generated 1.4 million 454 reads from early senescing flag leaves (average ~350 nt) and assembled 1.2 million into 30,497 contigs that were used as a reference to map 145 million Illumina reads from three wild type and four GPC-RNAi plants. Following normalization and statistical testing, we identified a set of 691 genes differentially regulated by GPC (431 ≥ 2-fold change). Transcript level ratios between transgenic and wild type plants showed a high correlation (R = 0.83) between qRT-PCR and Illumina results, providing independent validation of the mRNA-seq approach. A set of differentially expressed genes were analyzed across an early senescence time-course. Monocarpic senescence is an active process characterized by large-scale changes in gene expression which begins considerably before the appearance of visual symptoms of senescence. The mRNA-seq approach used here was able to detect small differences in transcript levels during the early stages of senescence. This resulted in an extensive list of GPC-regulated genes, which includes transporters, hormone regulated genes, and transcription factors. These GPC-regulated genes, particularly those up-regulated during senescence, provide valuable entry points to dissect the early stages of monocarpic senescence and nutrient remobilization in wheat.
Kamber, Tim; Buchmann, Jan P; Pothier, Joël F; Smits, Theo H M; Wicker, Thomas; Duffy, Brion
2016-02-17
The molecular basis of resistance and susceptibility of host plants to fire blight, a major disease threat to pome fruit production globally, is largely unknown. RNA-sequencing data from challenged and mock-inoculated flowers were analyzed to assess the susceptible response of apple to the fire blight pathogen Erwinia amylovora. In presence of the pathogen 1,080 transcripts were differentially expressed at 48 h post inoculation. These included putative disease resistance, stress, pathogen related, general metabolic, and phytohormone related genes. Reads, mapped to regions on the apple genome where no genes were assigned, were used to identify potential novel genes and open reading frames. To identify transcripts specifically expressed in response to E. amylovora, RT-PCRs were conducted and compared to the expression patterns of the fire blight biocontrol agent Pantoea vagans strain C9-1, another apple pathogen Pseudomonas syringae pv. papulans, and mock inoculated apple flowers. This led to the identification of a peroxidase superfamily gene that was lower expressed in response to E. amylovora suggesting a potential role in the susceptibility response. Overall, this study provides the first transcriptional profile by RNA-seq of the host plant during fire blight disease and insights into the response of susceptible apple plants to E. amylovora.
Kamber, Tim; Buchmann, Jan P.; Pothier, Joël F.; Smits, Theo H. M.; Wicker, Thomas; Duffy, Brion
2016-01-01
The molecular basis of resistance and susceptibility of host plants to fire blight, a major disease threat to pome fruit production globally, is largely unknown. RNA-sequencing data from challenged and mock-inoculated flowers were analyzed to assess the susceptible response of apple to the fire blight pathogen Erwinia amylovora. In presence of the pathogen 1,080 transcripts were differentially expressed at 48 h post inoculation. These included putative disease resistance, stress, pathogen related, general metabolic, and phytohormone related genes. Reads, mapped to regions on the apple genome where no genes were assigned, were used to identify potential novel genes and open reading frames. To identify transcripts specifically expressed in response to E. amylovora, RT-PCRs were conducted and compared to the expression patterns of the fire blight biocontrol agent Pantoea vagans strain C9-1, another apple pathogen Pseudomonas syringae pv. papulans, and mock inoculated apple flowers. This led to the identification of a peroxidase superfamily gene that was lower expressed in response to E. amylovora suggesting a potential role in the susceptibility response. Overall, this study provides the first transcriptional profile by RNA-seq of the host plant during fire blight disease and insights into the response of susceptible apple plants to E. amylovora. PMID:26883568
Identification and characterization of long non-coding RNAs in rainbow trout eggs
USDA-ARS?s Scientific Manuscript database
Long non-coding RNAs (lncRNAs) are in general considered as a diverse class of transcripts longer than 200 nucleotides that structurally resemble mRNAs but do not encode proteins. Recent advances in RNA sequencing (RNA-Seq) and bioinformatics methods have provided an opportunity to indentify and ana...
Bayesian Correlation Analysis for Sequence Count Data
Lau, Nelson; Perkins, Theodore J.
2016-01-01
Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449
Du, Minmin; Zhao, Jiuhai; Tzeng, David T W; Liu, Yuanyuan; Deng, Lei; Yang, Tianxia; Zhai, Qingzhe; Wu, Fangming; Huang, Zhuo; Zhou, Ming; Wang, Qiaomei; Chen, Qian; Zhong, Silin; Li, Chang-Bao; Li, Chuanyou
2017-08-01
The hormone jasmonate (JA), which functions in plant immunity, regulates resistance to pathogen infection and insect attack through triggering genome-wide transcriptional reprogramming in plants. We show that the basic helix-loop-helix transcription factor (TF) MYC2 in tomato ( Solanum lycopersicum ) acts downstream of the JA receptor to orchestrate JA-mediated activation of both the wounding and pathogen responses. Using chromatin immunoprecipitation sequencing (ChIP-seq) coupled with RNA sequencing (RNA-seq) assays, we identified 655 MYC2-targeted JA-responsive genes. These genes are highly enriched in Gene Ontology categories related to TFs and the early response to JA, indicating that MYC2 functions at a high hierarchical level to regulate JA-mediated gene transcription. We also identified a group of MYC2-targeted TFs (MTFs) that may directly regulate the JA-induced transcription of late defense genes. Our findings suggest that MYC2 and its downstream MTFs form a hierarchical transcriptional cascade during JA-mediated plant immunity that initiates and amplifies transcriptional output. As proof of concept, we showed that during plant resistance to the necrotrophic pathogen Botrytis cinerea , MYC2 and the MTF JA2-Like form a transcription module that preferentially regulates wounding-responsive genes, whereas MYC2 and the MTF ETHYLENE RESPONSE FACTOR.C3 form a transcription module that preferentially regulates pathogen-responsive genes. © 2017 American Society of Plant Biologists. All rights reserved.
Liu, Yuanyuan; Deng, Lei; Wu, Fangming; Huang, Zhuo; Zhou, Ming; Chen, Qian; Zhong, Silin
2017-01-01
The hormone jasmonate (JA), which functions in plant immunity, regulates resistance to pathogen infection and insect attack through triggering genome-wide transcriptional reprogramming in plants. We show that the basic helix-loop-helix transcription factor (TF) MYC2 in tomato (Solanum lycopersicum) acts downstream of the JA receptor to orchestrate JA-mediated activation of both the wounding and pathogen responses. Using chromatin immunoprecipitation sequencing (ChIP-seq) coupled with RNA sequencing (RNA-seq) assays, we identified 655 MYC2-targeted JA-responsive genes. These genes are highly enriched in Gene Ontology categories related to TFs and the early response to JA, indicating that MYC2 functions at a high hierarchical level to regulate JA-mediated gene transcription. We also identified a group of MYC2-targeted TFs (MTFs) that may directly regulate the JA-induced transcription of late defense genes. Our findings suggest that MYC2 and its downstream MTFs form a hierarchical transcriptional cascade during JA-mediated plant immunity that initiates and amplifies transcriptional output. As proof of concept, we showed that during plant resistance to the necrotrophic pathogen Botrytis cinerea, MYC2 and the MTF JA2-Like form a transcription module that preferentially regulates wounding-responsive genes, whereas MYC2 and the MTF ETHYLENE RESPONSE FACTOR.C3 form a transcription module that preferentially regulates pathogen-responsive genes. PMID:28733419
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
2017-12-06
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Rapid Recovery Gene Downregulation during Excess-Light Stress and Recovery in Arabidopsis.
Crisp, Peter A; Ganguly, Diep R; Smith, Aaron B; Murray, Kevin D; Estavillo, Gonzalo M; Searle, Iain; Ford, Ethan; Bogdanović, Ozren; Lister, Ryan; Borevitz, Justin O; Eichten, Steven R; Pogson, Barry J
2017-08-01
Stress recovery may prove to be a promising approach to increase plant performance and, theoretically, mRNA instability may facilitate faster recovery. Transcriptome (RNA-seq, qPCR, sRNA-seq, and PARE) and methylome profiling during repeated excess-light stress and recovery was performed at intervals as short as 3 min. We demonstrate that 87% of the stress-upregulated mRNAs analyzed exhibit very rapid recovery. For instance, HSP101 abundance declined 2-fold every 5.1 min. We term this phenomenon rapid recovery gene downregulation (RRGD), whereby mRNA abundance rapidly decreases promoting transcriptome resetting. Decay constants ( k ) were modeled using two strategies, linear and nonlinear least squares regressions, with the latter accounting for both transcription and degradation. This revealed extremely short half-lives ranging from 2.7 to 60.0 min for 222 genes. Ribosome footprinting using degradome data demonstrated RRGD loci undergo cotranslational decay and identified changes in the ribosome stalling index during stress and recovery. However, small RNAs and 5'-3' RNA decay were not essential for recovery of the transcripts examined, nor were any of the six excess light-associated methylome changes. We observed recovery-specific gene expression networks upon return to favorable conditions and six transcriptional memory types. In summary, rapid transcriptome resetting is reported in the context of active recovery and cellular memory. © 2017 American Society of Plant Biologists. All rights reserved.
Rapid Recovery Gene Downregulation during Excess-Light Stress and Recovery in Arabidopsis[OPEN
Estavillo, Gonzalo M.
2017-01-01
Stress recovery may prove to be a promising approach to increase plant performance and, theoretically, mRNA instability may facilitate faster recovery. Transcriptome (RNA-seq, qPCR, sRNA-seq, and PARE) and methylome profiling during repeated excess-light stress and recovery was performed at intervals as short as 3 min. We demonstrate that 87% of the stress-upregulated mRNAs analyzed exhibit very rapid recovery. For instance, HSP101 abundance declined 2-fold every 5.1 min. We term this phenomenon rapid recovery gene downregulation (RRGD), whereby mRNA abundance rapidly decreases promoting transcriptome resetting. Decay constants (k) were modeled using two strategies, linear and nonlinear least squares regressions, with the latter accounting for both transcription and degradation. This revealed extremely short half-lives ranging from 2.7 to 60.0 min for 222 genes. Ribosome footprinting using degradome data demonstrated RRGD loci undergo cotranslational decay and identified changes in the ribosome stalling index during stress and recovery. However, small RNAs and 5ʹ-3ʹ RNA decay were not essential for recovery of the transcripts examined, nor were any of the six excess light-associated methylome changes. We observed recovery-specific gene expression networks upon return to favorable conditions and six transcriptional memory types. In summary, rapid transcriptome resetting is reported in the context of active recovery and cellular memory. PMID:28705956
Hennessy, Rosanna C; Glaring, Mikkel A; Olsson, Stefan; Stougaard, Peter
2017-08-10
Few studies to date report the transcriptional response of biocontrol bacteria toward phytopathogens. In order to gain insights into the potential mechanism underlying the antagonism of the antimicrobial producing strain P. fluorescens In5 against the phytopathogens Rhizoctonia solani and Pythium aphanidermatum, global RNA sequencing was performed. Differential gene expression profiling of P. fluorescens In5 in response to either R. solani or P. aphanidermatum was investigated using transcriptome sequencing (RNA-seq). Total RNA was isolated from single bacterial cultures of P. fluorescens In5 or bacterial cultures in dual-culture for 48 h with each pathogen in biological triplicates. RNA-seq libraries were constructed following a default Illumina stranded RNA protocol including rRNA depletion and were sequenced 2 × 100 bases on Illumina HiSeq generating approximately 10 million reads per sample. No significant changes in global gene expression were recorded during dual-culture of P. fluorescens In5 with any of the two pathogens but rather each pathogen appeared to induce expression of a specific set of genes. A particularly strong transcriptional response to R. solani was observed and notably several genes possibly associated with secondary metabolite detoxification and metabolism were highly upregulated in response to the fungus. A total of 23 genes were significantly upregulated and seven genes were significantly downregulated with at least respectively a threefold change in expression level in response to R. solani compared to the no fungus control. In contrast, only one gene was significantly upregulated over threefold and three transcripts were significantly downregulated over threefold in response to P. aphanidermatum. Genes known to be involved in synthesis of secondary metabolites, e.g. non-ribosomal synthetases and hydrogen cyanide were not differentially expressed at the time points studied. This study demonstrates that genes possibly involved in metabolite detoxification are highly upregulated in P. fluorescens In5 when co-cultured with plant pathogens and in particular the fungus R. solani. This highlights the importance of studying microbe-microbe interactions to gain a better understanding of how different systems function in vitro and ultimately in natural systems where biocontrol agents can be used for the sustainable management of plant diseases.
Revealing the missing expressed genes beyond the human reference genome by RNA-Seq.
Chen, Geng; Li, Ruiyuan; Shi, Leming; Qi, Junyi; Hu, Pengzhan; Luo, Jian; Liu, Mingyao; Shi, Tieliu
2011-12-02
The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.
Yu, Ying; Zhao, Chen; Su, Zhenqiang; Wang, Charles; Fuscoe, James C; Tong, Weida; Shi, Leming
2014-01-01
The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.
Divergent transcription is associated with promoters of transcriptional regulators
2013-01-01
Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
Mulligan, Megan K; Mozhui, Khyobeni; Pandey, Ashutosh K; Smith, Maren L; Gong, Suzhen; Ingels, Jesse; Miles, Michael F; Lopez, Marcelo F; Lu, Lu; Williams, Robert W
2017-02-01
Genetic factors that influence the transition from initial drinking to dependence remain enigmatic. Recent studies have leveraged chronic intermittent ethanol (CIE) paradigms to measure changes in brain gene expression in a single strain at 0, 8, 72 h, and even 7 days following CIE. We extend these findings using LCM RNA-seq to profile expression in 11 brain regions in two inbred strains - C57BL/6J (B6) and DBA/2J (D2) - 72 h following multiple cycles of ethanol self-administration and CIE. Linear models identified differential expression based on treatment, region, strain, or interactions with treatment. Nearly 40% of genes showed a robust effect (FDR < 0.01) of region, and hippocampus CA1, cortex, bed nucleus stria terminalis, and nucleus accumbens core had the highest number of differentially expressed genes after treatment. Another 8% of differentially expressed genes demonstrated a robust effect of strain. As expected, based on similar studies in B6, treatment had a much smaller impact on expression; only 72 genes (p < 0.01) are modulated by treatment (independent of region or strain). Strikingly, many more genes (415) show a strain-specific and largely opposite response to treatment and are enriched in processes related to RNA metabolism, transcription factor activity, and mitochondrial function. Over 3 times as many changes in gene expression were detected in D2 compared to B6, and weighted gene co-expression network analysis (WGCNA) module comparison identified more modules enriched for treatment effects in D2. Substantial strain differences exist in the temporal pattern of transcriptional neuroadaptation to CIE, and these may drive individual differences in risk of addiction following excessive alcohol consumption. Copyright © 2016 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding.
Agarwal, Pinky; Parida, Swarup K; Mahto, Arunima; Das, Sweta; Mathew, Iny Elizebeth; Malik, Naveen; Tyagi, Akhilesh K
2014-12-01
The transcript pool of a plant part, under any given condition, is a collection of mRNAs that will pave the way for a biochemical reaction of the plant to stimuli. Over the past decades, transcriptome study has advanced from Northern blotting to RNA sequencing (RNA-seq), through other techniques, of which real-time quantitative polymerase chain reaction (PCR) and microarray are the most significant ones. The questions being addressed by such studies have also matured from a solitary process to expression atlas and marker-assisted genetic enhancement. Not only genes and their networks involved in various developmental processes of plant parts have been elucidated, but also stress tolerant genes have been highlighted. The transcriptome of a plant with altered expression of a target gene has given information about the downstream genes. Marker information has been used for breeding improved varieties. Fortunately, the data generated by transcriptome analysis has been made freely available for ample utilization and comparison. The review discusses this wide variety of transcriptome data being generated in plants, which includes developmental stages, abiotic and biotic stress, effect of altered gene expression, as well as comparative transcriptomics, with a special emphasis on microarray and RNA-seq. Such data can be used to determine the regulatory gene networks, which can subsequently be utilized for generating improved plant varieties. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Li, Dong; Zuo, Qisheng; Lian, Chao; Zhang, Lei; Shi, Qingqing; Zhang, Zhentao; Wang, Yingjie; Ahmed, Mahmoud F; Tang, Beibei; Xiao, Tianrong; Zhang, Yani; Li, Bichun
2015-08-01
We explored the regulatory mechanism of protein metabolism during the differentiation process of chicken male germ cells and provide a basis for improving the induction system of embryonic stem cell differentiation to male germ cells in vitro. We sequenced the transcriptome of embryonic stem cells, primordial germ cells, and spermatogonial stem cells with RNA sequencing (RNA-Seq), bioinformatics analysis methods, and detection of the key genes by quantitative reverse transcription PCR (qRT-PCR). Finally, we found 16 amino acid metabolic pathways enriched in the biological metabolism during the differentiation process of embryonic stem cells to primordial germ cells and 15 amino acid metabolic pathways enriched in the differentiation stage of primordial germ cells to spermatogonial stem cells. We found three pathways, arginine-proline metabolic pathway, tyrosine metabolic pathway, and tryptophan metabolic pathway, significantly enriched in the whole differentiation process of embryonic stem cells to spermatogonial stem cells. Moreover, for these three pathways, we screened key genes such as NOS2, ADC, FAH, and IDO. qRT-PCR results showed that the expression trend of these genes were the same to RNA-Seq. Our findings showed that the three pathways and these key genes play an important role in the differentiation process of embryonic stem cells to male germ cells. These results provide basic information for improving the induction system of embryonic stem cell differentiation to male germ cells in vitro.
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie
2018-01-01
Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630
Hackett, Justin B; Lu, Yan
2017-05-04
In land plants, plastid and mitochondrial RNAs are subject to post-transcriptional C-to-U RNA editing. T-DNA insertions in the ORGANELLE RNA RECOGNITION MOTIF PROTEIN6 gene resulted in reduced photosystem II (PSII) activity and smaller plant and leaf sizes. Exon coverage analysis of the ORRM6 gene showed that orrm6-1 and orrm6-2 are loss-of-function mutants. Compared to other ORRM proteins, ORRM6 affects a relative small number of RNA editing sites. Sanger sequencing of reverse transcription-PCR products of plastid transcripts revealed 2 plastid RNA editing sites that are substantially affected in the orrm6 mutants: psbF-C77 and accD-C794. The psbF gene encodes the β subunit of cytochrome b 559 , an essential component of PSII. The accD gene encodes the β subunit of acetyl-CoA carboxylase, a protein required in plastid fatty acid biosynthesis. Whole-transcriptome RNA-seq demonstrated that editing at psbF-C77 is nearly absent and the editing extent at accD-C794 was significantly reduced. Gene set enrichment pathway analysis showed that expression of multiple gene sets involved in photosynthesis, especially photosynthetic electron transport, is significantly upregulated in both orrm6 mutants. The upregulation could be a mechanism to compensate for the reduced PSII electron transport rate in the orrm6 mutants. These results further demonstrated that Organelle RNA Recognition Motif protein ORRM6 is required in editing of specific RNAs in the Arabidopsis (Arabidopsis thaliana) plastid.
Rong-Mullins, Xiaoqing; Ayers, Michael C.; Summers, Mahmoud; Gallagher, Jennifer E. G.
2017-01-01
Cellular metabolism can change the potency of a chemical’s tumorigenicity. 4-nitroquinoline-1-oxide (4NQO) is a tumorigenic drug widely used on animal models for cancer research. Polymorphisms of the transcription factor Yrr1 confer different levels of resistance to 4NQO in Saccharomyces cerevisiae. To study how different Yrr1 alleles regulate gene expression leading to resistance, transcriptomes of three isogenic S. cerevisiae strains carrying different Yrr1 alleles were profiled via RNA sequencing (RNA-Seq) and chromatin immunoprecipitation coupled with sequencing (ChIP-Seq) in the presence and absence of 4NQO. In response to 4NQO, all alleles of Yrr1 drove the expression of SNQ2 (a multidrug transporter), which was highest in the presence of 4NQO resistance-conferring alleles, and overexpression of SNQ2 alone was sufficient to overcome 4NQO-sensitive growth. Using shape metrics to refine the ChIP-Seq peaks, Yrr1 strongly associated with three loci including SNQ2. In addition to a known Yrr1 target SNG1, Yrr1 also bound upstream of RPL35B; however, overexpression of these genes did not confer 4NQO resistance. RNA-Seq data also implicated nucleotide synthesis pathways including the de novo purine pathway, and the ribonuclease reductase pathways were downregulated in response to 4NQO. Conversion of a 4NQO-sensitive allele to a 4NQO-resistant allele by a single point mutation mimicked the 4NQO-resistant allele in phenotype, and while the 4NQO resistant allele increased the expression of the ADE genes in the de novo purine biosynthetic pathway, the mutant Yrr1 increased expression of ADE genes even in the absence of 4NQO. These same ADE genes were only increased in the wild-type alleles in the presence of 4NQO, indicating that the point mutation activated Yrr1 to upregulate a pathway normally only activated in response to stress. The various Yrr1 alleles also influenced growth on different carbon sources by altering the function of the mitochondria. Hence, the complement to 4NQO resistance was poor growth on nonfermentable carbon sources, which in turn varied depending on the allele of Yrr1 expressed in the isogenic yeast. The oxidation state of the yeast affected the 4NQO toxicity by altering the reactive oxygen species (ROS) generated by cellular metabolism. The integration of RNA-Seq and ChIP-Seq elucidated how Yrr1 regulates global gene transcription in response to 4NQO and how various Yrr1 alleles confer differential resistance to 4NQO. This study provides guidance for further investigation into how Yrr1 regulates cellular responses to 4NQO, as well as transcriptomic resources for further analysis of transcription factor variation on carbon source utilization. PMID:29208650
Kang, Eun Yong; Martin, Lisa J.; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J.; Shifman, Sagiv; Eskin, Eleazar
2016-01-01
The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here, we increased the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We designed a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-sequencing (RNA-seq) data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. A total of 2309 SNPs were identified as being associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for a regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases. PMID:27765809
Kayal, Ehsan; Bentlage, Bastian; Collins, Allen G
2016-09-01
In most animals, the mitochondrial genome is characterized by its small size, organization into a single circular molecule, and a relative conservation of the number of encoded genes. In box jellyfish (Cubozoa, Cnidaria), the mitochondrial genome is organized into 8 linear mito-chromosomes harboring between one and 4 genes each, including 2 extra protein-coding genes: mt-polB and orf314. Such an organization challenges the traditional view of mitochondrial DNA (mtDNA) expression in animals. In this study, we investigate the pattern of mitochondrial gene expression in the box jellyfish Alatina alata, as well as several key nuclear-encoded molecular pathways involved in the processing of mitochondrial gene transcription. Read coverage of DNA-seq data is relatively uniform for all 8 mito-chromosomes, suggesting that each mito-chromosome is present in equimolar proportion in the mitochondrion. Comparison of DNA and RNA-seq based assemblies indicates that mito-chromosomes are transcribed into individual transcripts in which the beginning and ending are highly conserved. Expression levels for mt-polB and orf314 are similar to those of other mitochondrial-encoded genes, which provides further evidence for them having functional roles in the mitochondrion. Survey of the transcriptome suggests recognition of the mitochondrial tRNA-Met by the cytoplasmic aminoacyl-tRNA synthetase counterpart and C-to-U editing of the cytoplasmic tRNA-Trp after import into the mitochondrion. Moreover, several mitochondrial ribosomal proteins appear to be lost. This study represents the first survey of mitochondrial gene expression of the linear multi-chromosomal mtDNA in box jellyfish (Cubozoa). Future exploration of small RNAs and the proteome of the mitochondrion will test the hypotheses presented herein.
Holmstrom, Sam R; Deering, Tye; Swift, Galvin H; Poelwijk, Frank J; Mangelsdorf, David J; Kliewer, Steven A; MacDonald, Raymond J
2011-08-15
We have determined the cistrome and transcriptome for the nuclear receptor liver receptor homolog-1 (LRH-1) in exocrine pancreas. Chromatin immunoprecipitation (ChIP)-seq and RNA-seq analyses reveal that LRH-1 directly induces expression of genes encoding digestive enzymes and secretory and mitochondrial proteins. LRH-1 cooperates with the pancreas transcription factor 1-L complex (PTF1-L) in regulating exocrine pancreas-specific gene expression. Elimination of LRH-1 in adult mice reduced the concentration of several lipases and proteases in pancreatic fluid and impaired pancreatic fluid secretion in response to cholecystokinin. Thus, LRH-1 is a key regulator of the exocrine pancreas-specific transcriptional network required for the production and secretion of pancreatic fluid.
Charfeddine, Mariam; Saïdi, Mohamed Najib; Charfeddine, Safa; Hammami, Asma; Gargouri Bouzid, Radhia
2015-04-01
The ERF transcription factors belong to the AP2/ERF superfamily, one of the largest transcription factor families in plants. They play important roles in plant development processes, as well as in the response to biotic, abiotic, and hormone signaling. In the present study, 155 putative ERF transcription factor genes were identified from the potato (Solanum tuberosum) genome database, and compared with those from Arabidopsis thaliana. The StERF proteins are divided into ten phylogenetic groups. Expression analyses of five StERFs were carried out by semi-quantitative RT-PCR and compared with published RNA-seq data. These latter analyses were used to distinguish tissue-specific, biotic, and abiotic stress genes as well as hormone-responsive StERF genes. The results are of interest to better understand the role of the AP2/ERF genes in response to diverse types of stress in potatoes. A comprehensive analysis of the physiological functions and biological roles of the ERF family genes in S. tuberosum is required to understand crop stress tolerance mechanisms.
A Herpesviral Immediate Early Protein Promotes Transcription Elongation of Viral Transcripts.
Fox, Hannah L; Dembowski, Jill A; DeLuca, Neal A
2017-06-13
Herpes simplex virus 1 (HSV-1) genes are transcribed by cellular RNA polymerase II (RNA Pol II). While four viral immediate early proteins (ICP4, ICP0, ICP27, and ICP22) function in some capacity in viral transcription, the mechanism by which ICP22 functions remains unclear. We observed that the FACT complex (comprised of SSRP1 and Spt16) was relocalized in infected cells as a function of ICP22. ICP22 was also required for the association of FACT and the transcription elongation factors SPT5 and SPT6 with viral genomes. We further demonstrated that the FACT complex interacts with ICP22 throughout infection. We therefore hypothesized that ICP22 recruits cellular transcription elongation factors to viral genomes for efficient transcription elongation of viral genes. We reevaluated the phenotype of an ICP22 mutant virus by determining the abundance of all viral mRNAs throughout infection by transcriptome sequencing (RNA-seq). The accumulation of almost all viral mRNAs late in infection was reduced compared to the wild type, regardless of kinetic class. Using chromatin immunoprecipitation sequencing (ChIP-seq), we mapped the location of RNA Pol II on viral genes and found that RNA Pol II levels on the bodies of viral genes were reduced in the ICP22 mutant compared to wild-type virus. In contrast, the association of RNA Pol II with transcription start sites in the mutant was not reduced. Taken together, our results indicate that ICP22 plays a role in recruiting elongation factors like the FACT complex to the HSV-1 genome to allow for efficient viral transcription elongation late in viral infection and ultimately infectious virion production. IMPORTANCE HSV-1 interacts with many cellular proteins throughout productive infection. Here, we demonstrate the interaction of a viral protein, ICP22, with a subset of cellular proteins known to be involved in transcription elongation. We determined that ICP22 is required to recruit the FACT complex and other transcription elongation factors to viral genomes and that in the absence of ICP22 viral transcription is globally reduced late in productive infection, due to an elongation defect. This insight defines a fundamental role of ICP22 in HSV-1 infection and elucidates the involvement of cellular factors in HSV-1 transcription. Copyright © 2017 Fox et al.
USDA-ARS?s Scientific Manuscript database
The homeodomain leucine zipper (HD-Zip) transcription factor family is one of the largest plant specific superfamilies, and includes genes with roles in modulation of plant growth and response to environmental stresses. Many HD-Zip genes are well characterized in Arabidopsis (Arabidopsis thaliana), ...
Hartono, Stella R; Malapert, Amélie; Legros, Pénélope; Bernard, Pascal; Chédin, Frédéric; Vanoosthuyse, Vincent
2018-02-02
R-loops, which result from the formation of stable DNA:RNA hybrids, can both threaten genome integrity and act as physiological regulators of gene expression and chromatin patterning. To characterize R-loops in fission yeast, we used the S9.6 antibody-based DRIPc-seq method to sequence the RNA strand of R-loops and obtain strand-specific R-loop maps at near nucleotide resolution. Surprisingly, preliminary DRIPc-seq experiments identified mostly RNase H-resistant but exosome-sensitive RNAs that mapped to both DNA strands and resembled RNA:RNA hybrids (dsRNAs), suggesting that dsRNAs form widely in fission yeast. We confirmed in vitro that S9.6 can immuno-precipitate dsRNAs and provide evidence that dsRNAs can interfere with its binding to R-loops. dsRNA elimination by RNase III treatment prior to DRIPc-seq allowed the genome-wide and strand-specific identification of genuine R-loops that responded in vivo to RNase H levels and displayed classical features associated with R-loop formation. We also found that most transcripts whose levels were altered by in vivo manipulation of RNase H levels did not form detectable R-loops, suggesting that prolonged manipulation of R-loop levels could indirectly alter the transcriptome. We discuss the implications of our work in the design of experimental strategies to probe R-loop functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Comparative Transcriptomic Analyses of Vegetable and Grain Pea (Pisum sativum L.) Seed Development
Liu, Na; Zhang, Guwen; Xu, Shengchun; Mao, Weihua; Hu, Qizan; Gong, Yaming
2015-01-01
Understanding the molecular mechanisms regulating pea seed developmental process is extremely important for pea breeding. In this study, we used high-throughput RNA-Seq and bioinformatics analyses to examine the changes in gene expression during seed development in vegetable pea and grain pea, and compare the gene expression profiles of these two pea types. RNA-Seq generated 18.7 G of raw data, which were then de novo assembled into 77,273 unigenes with a mean length of 930 bp. Our results illustrate that transcriptional control during pea seed development is a highly coordinated process. There were 459 and 801 genes differentially expressed at early and late seed maturation stages between vegetable pea and grain pea, respectively. Soluble sugar and starch metabolism related genes were significantly activated during the development of pea seeds coinciding with the onset of accumulation of sugar and starch in the seeds. A comparative analysis of genes involved in sugar and starch biosynthesis in vegetable pea (high seed soluble sugar and low starch) and grain pea (high seed starch and low soluble sugar) revealed that differential expression of related genes at late development stages results in a negative correlation between soluble sugar and starch biosynthetic flux in vegetable and grain pea seeds. RNA-Seq data was validated by using real-time quantitative RT-PCR analysis for 30 randomly selected genes. To our knowledge, this work represents the first report of seed development transcriptomics in pea. The obtained results provide a foundation to support future efforts to unravel the underlying mechanisms that control the developmental biology of pea seeds, and serve as a valuable resource for improving pea breeding. PMID:26635856
Li, Kong-Qing; Xu, Xiao-Yong; Huang, Xiao-San
2016-01-01
Drought is a major abiotic stress that affects plant growth, development and productivity. Pear is one of the most important deciduous fruit trees in the world, but the mechanisms of drought tolerance in this plant are still unclear. To better understand the molecular basis regarding drought stress response, RNA-seq was performed on samples collected before and after dehydration in Pyrus betulaefolia. In total, 19,532 differentially expressed genes (DEGs) were identified. These genes were annotated into 144 Gene Ontology (GO) terms and 18 clusters of orthologous groups (COG) involved in 129 Kyoto Encyclopedia of Genes and Genomes (KEGG) defined pathways. These DEGs comprised 49 (26 up-regulated, 23 down-regulated), 248 (166 up-regulated, 82 down-regulated), 3483 (1295 up-regulated, 2188 down-regulated), 1455 (1065 up-regulated, 390 down-regulated) genes from the 1 h, 3 h and 6 h dehydration-treated samples and a 24 h recovery samples, respectively. RNA-seq was validated by analyzing the expresson patterns of randomly selected 16 DEGs by quantitative real-time PCR. Photosynthesis, signal transduction, innate immune response, protein phosphorylation, response to water, response to biotic stimulus, and plant hormone signal transduction were the most significantly enriched GO categories amongst the DEGs. A total of 637 transcription factors were shown to be dehydration responsive. In addition, a number of genes involved in the metabolism and signaling of hormones were significantly affected by the dehydration stress. This dataset provides valuable information regarding the Pyrus betulaefolia transcriptome changes in response to dehydration and may promote identification and functional analysis of potential genes that could be used for improving drought tolerance via genetic engineering of non-model, but economically-important, perennial species.
Bearson, Shawn M. D; Brunelle, Brian W; Bayles, Darrell O; Lee, In Soo; Kich, Jalusa D
2017-01-01
Purpose Non-host-adapted Salmonella serovars, including the common human food-borne pathogen Salmonella enterica serovar Typhimurium (S. Typhimurium), are opportunistic pathogens that can colonize food-producing animals without causing overt disease. Interventions against Salmonella are needed to enhance food safety, protect animal health and allow the differentiation of infected from vaccinated animals (DIVA). Methodology An attenuated S. Typhimurium DIVA vaccine (BBS 866) was characterized for the protection of pigs following challenge with virulent S. Typhimurium. The porcine transcriptional response to BBS 866 vaccination was evaluated. RNA-Seq analysis was used to compare gene expression between BBS 866 and its parent; phenotypic assays were performed to confirm transcriptional differences observed between the strains. Results Vaccination significantly reduced fever and interferon-gamma (IFNγ) levels in swine challenged with virulent S. Typhimurium compared to mock-vaccinated pigs. Salmonella faecal shedding and gastrointestinal tissue colonization were significantly lower in vaccinated swine. RNA-Seq analysis comparing BBS 866 to its parental S. Typhimurium strain demonstrated reduced expression of the genes involved in cellular invasion and bacterial motility; decreased invasion of porcine-derived IPEC-J2 cells and swimming motility for the vaccine strain was consistent with the RNA-Seq analysis. Numerous membrane proteins were differentially expressed, which was an anticipated gene expression pattern due to the targeted deletion of several regulatory genes in the vaccine strain. RNA-Seq analysis indicated that genes involved in the porcine immune and inflammatory response were differentially regulated at 2 days post-vaccination compared to pre-vaccination. Conclusion Evaluation of the S. Typhimurium DIVA vaccine indicates that vaccination will provide both swine health and food safety benefits. PMID:28516860
Manteniotis, Stavros; Lehmann, Ramona; Flegel, Caroline; Vogel, Felix; Hofreuter, Adrian; Schreiner, Benjamin S. P.; Altmüller, Janine; Becker, Christian; Schöbel, Nicole; Hatt, Hanns; Gisselmann, Günter
2013-01-01
The specific functions of sensory systems depend on the tissue-specific expression of genes that code for molecular sensor proteins that are necessary for stimulus detection and membrane signaling. Using the Next Generation Sequencing technique (RNA-Seq), we analyzed the complete transcriptome of the trigeminal ganglia (TG) and dorsal root ganglia (DRG) of adult mice. Focusing on genes with an expression level higher than 1 FPKM (fragments per kilobase of transcript per million mapped reads), we detected the expression of 12984 genes in the TG and 13195 in the DRG. To analyze the specific gene expression patterns of the peripheral neuronal tissues, we compared their gene expression profiles with that of the liver, brain, olfactory epithelium, and skeletal muscle. The transcriptome data of the TG and DRG were scanned for virtually all known G-protein-coupled receptors (GPCRs) as well as for ion channels. The expression profile was ranked with regard to the level and specificity for the TG. In total, we detected 106 non-olfactory GPCRs and 33 ion channels that had not been previously described as expressed in the TG. To validate the RNA-Seq data, in situ hybridization experiments were performed for several of the newly detected transcripts. To identify differences in expression profiles between the sensory ganglia, the RNA-Seq data of the TG and DRG were compared. Among the differentially expressed genes (> 1 FPKM), 65 and 117 were expressed at least 10-fold higher in the TG and DRG, respectively. Our transcriptome analysis allows a comprehensive overview of all ion channels and G protein-coupled receptors that are expressed in trigeminal ganglia and provides additional approaches for the investigation of trigeminal sensing as well as for the physiological and pathophysiological mechanisms of pain. PMID:24260241
Jakomin, Marcello; Chessa, Daniela; Bäumler, Andreas J; Casadesús, Josep
2008-11-01
DNA adenine methylase (dam) mutants of Salmonella enterica serovar Typhimurium grown under laboratory conditions express the std fimbrial operon, which is tightly repressed in the wild type. Here, we show that uncontrolled production of Std fimbriae in S. enterica serovar Typhimurium dam mutants contributes to attenuation in mice, as indicated by the observation that an stdA dam strain is more competitive than a dam strain upon oral infection. Dam methylation appears to regulate std transcription, rather than std mRNA stability or turnover. A genetic screen for std regulators showed that the GATC-binding protein SeqA directly or indirectly represses std expression, while the poorly characterized yifA gene product serves as an std activator. YifA encodes a putative LysR-like protein and has been renamed HdfR, like its Escherichia coli homolog. Activation of std expression by HdfR is observed only in dam and seqA backgrounds. These data suggest that HdfR directly or indirectly activates std transcription. Since SeqA is unable to bind nonmethylated DNA, it is possible that std operon derepression in dam and seqA mutants may result from unconstrained HdfR-mediated activation of std transcription. Derepression of std in dam and seqA mutants of S. enterica occurs in only a fraction of the bacterial population, suggesting the occurrence of either bistable expression or phase variation.
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.
Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel
2013-09-01
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Cánovas, Angela; Reverter, Antonio; DeAtley, Kasey L.; Ashley, Ryan L.; Colgrave, Michelle L.; Fortes, Marina R. S.; Islas-Trejo, Alma; Lehnert, Sigrid; Porto-Neto, Laercio; Rincón, Gonzalo; Silver, Gail A.; Snelling, Warren M.; Medrano, Juan F.; Thomas, Milton G.
2014-01-01
Puberty is a complex physiological event by which animals mature into an adult capable of sexual reproduction. In order to enhance our understanding of the genes and regulatory pathways and networks involved in puberty, we characterized the transcriptome of five reproductive tissues (i.e. hypothalamus, pituitary gland, ovary, uterus, and endometrium) as well as tissues known to be relevant to growth and metabolism needed to achieve puberty (i.e., longissimus dorsi muscle, adipose, and liver). These tissues were collected from pre- and post-pubertal Brangus heifers (3/8 Brahman; Bos indicus x 5/8 Angus; Bos taurus) derived from a population of cattle used to identify quantitative trait loci associated with fertility traits (i.e., age of first observed corpus luteum (ACL), first service conception (FSC), and heifer pregnancy (HPG)). In order to exploit the power of complementary omics analyses, pre- and post-puberty co-expression gene networks were constructed by combining the results from genome-wide association studies (GWAS), RNA-Seq, and bovine transcription factors. Eight tissues among pre-pubertal and post-pubertal Brangus heifers revealed 1,515 differentially expressed and 943 tissue-specific genes within the 17,832 genes confirmed by RNA-Seq analysis. The hypothalamus experienced the most notable up-regulation of genes via puberty (i.e., 204 out of 275 genes). Combining the results of GWAS and RNA-Seq, we identified 25 loci containing a single nucleotide polymorphism (SNP) associated with ACL, FSC, and (or) HPG. Seventeen of these SNP were within a gene and 13 of the genes were expressed in uterus or endometrium. Multi-tissue omics analyses revealed 2,450 co-expressed genes relative to puberty. The pre-pubertal network had 372,861 connections whereas the post-pubertal network had 328,357 connections. A sub-network from this process revealed key transcriptional regulators (i.e., PITX2, FOXA1, DACH2, PROP1, SIX6, etc.). Results from these multi-tissue omics analyses improve understanding of the number of genes and their complex interactions for puberty in cattle. PMID:25048735
Jain, Shalu; Chittem, Kishore; Brueggeman, Robert; Osorno, Juan M; Richards, Jonathan; Nelson, Berlin D
2016-01-01
Soybean cyst nematode (SCN; Heterodera glycines Ichinohe) reproduces on the roots of common bean (Phaseolus vulgaris L.) and can cause reductions in plant growth and seed yield. The molecular changes in common bean roots caused by SCN infection are unknown. Identification of genetic factors associated with SCN resistance could help in development of improved bean varieties with high SCN resistance. Gene expression profiling was conducted on common bean roots infected by SCN HG type 0 using next generation RNA sequencing technology. Two pinto bean genotypes, PI533561 and GTS-900, resistant and susceptible to SCN infection, respectively, were used as RNA sources eight days post inoculation. Total reads generated ranged between ~ 3.2 and 5.7 million per library and were mapped to the common bean reference genome. Approximately 70-90% of filtered RNA-seq reads uniquely mapped to the reference genome. In the inoculated roots of resistant genotype PI533561, a total of 353 genes were differentially expressed with 154 up-regulated genes and 199 down-regulated genes when compared to the transcriptome of non- inoculated roots. On the other hand, 990 genes were differentially expressed in SCN-inoculated roots of susceptible genotype GTS-900 with 406 up-regulated and 584 down-regulated genes when compared to non-inoculated roots. Genes encoding nucleotide-binding site leucine-rich repeat resistance (NLR) proteins, WRKY transcription factors, pathogenesis-related (PR) proteins and heat shock proteins involved in diverse biological processes were differentially expressed in both resistant and susceptible genotypes. Overall, suppression of the photosystem was observed in both the responses. Furthermore, RNA-seq results were validated through quantitative real time PCR. This is the first report describing genes/transcripts involved in SCN-common bean interaction and the results will have important implications for further characterization of SCN resistance genes in common bean.
Jain, Shalu; Chittem, Kishore; Brueggeman, Robert; Osorno, Juan M.; Richards, Jonathan; Nelson, Berlin D.
2016-01-01
Soybean cyst nematode (SCN; Heterodera glycines Ichinohe) reproduces on the roots of common bean (Phaseolus vulgaris L.) and can cause reductions in plant growth and seed yield. The molecular changes in common bean roots caused by SCN infection are unknown. Identification of genetic factors associated with SCN resistance could help in development of improved bean varieties with high SCN resistance. Gene expression profiling was conducted on common bean roots infected by SCN HG type 0 using next generation RNA sequencing technology. Two pinto bean genotypes, PI533561 and GTS-900, resistant and susceptible to SCN infection, respectively, were used as RNA sources eight days post inoculation. Total reads generated ranged between ~ 3.2 and 5.7 million per library and were mapped to the common bean reference genome. Approximately 70–90% of filtered RNA-seq reads uniquely mapped to the reference genome. In the inoculated roots of resistant genotype PI533561, a total of 353 genes were differentially expressed with 154 up-regulated genes and 199 down-regulated genes when compared to the transcriptome of non- inoculated roots. On the other hand, 990 genes were differentially expressed in SCN-inoculated roots of susceptible genotype GTS-900 with 406 up-regulated and 584 down-regulated genes when compared to non-inoculated roots. Genes encoding nucleotide-binding site leucine-rich repeat resistance (NLR) proteins, WRKY transcription factors, pathogenesis-related (PR) proteins and heat shock proteins involved in diverse biological processes were differentially expressed in both resistant and susceptible genotypes. Overall, suppression of the photosystem was observed in both the responses. Furthermore, RNA-seq results were validated through quantitative real time PCR. This is the first report describing genes/transcripts involved in SCN-common bean interaction and the results will have important implications for further characterization of SCN resistance genes in common bean. PMID:27441552
Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.
Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry
2016-11-01
Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.
Shankar, Jata; Cerqueira, Gustavo C; Wortman, Jennifer R; Clemons, Karl V; Stevens, David A
2018-03-02
With the increasing numbers of immunocompromised hosts, Aspergillus fumigatus emerges as a lethal opportunistic fungal pathogen. Understanding innate and acquired immunity responses of the host is important for a better therapeutic strategy to deal with aspergillosis patients. To determine the transcriptome in the kidneys in aspergillosis, we employed RNA-Seq to obtain single 76-base reads of whole-genome transcripts of murine kidneys on a temporal basis (days 0; uninfected, 1, 2, 3 and 8) during invasive aspergillosis. A total of 6284 transcripts were downregulated, and 5602 were upregulated compared to baseline expression. Gene ontology enrichment analysis identified genes involved in innate and adaptive immune response, as well as iron binding and homeostasis, among others. Our results showed activation of pathogen recognition receptors, e.g., β-defensins, C-type lectins (e.g., dectin-1), Toll-like receptors (TLR-2, TLR-3, TLR-8, TLR-9 and TLR-13), as well as Ptx-3 and C-reactive protein among the soluble receptors. Upregulated transcripts encoding various differentiating cytokines and effector proinflammatory cytokines, as well as those encoding for chemokines and chemokine receptors, revealed Th-1 and Th-17-type immune responses. These studies form a basic dataset for experimental prioritization, including other target organs, to determine the global response of the host against Aspergillus infection.
Tabassum, Rubina; Sivadas, Ambily; Agrawal, Vartika; Tian, Haozheng; Arafat, Dalia; Gibson, Greg
2015-08-13
Personalized medicine is predicated on the notion that individual biochemical and genomic profiles are relatively constant in times of good health and to some extent predictive of disease or therapeutic response. We report a pilot study quantifying gene expression and methylation profile consistency over time, addressing the reasons for individual uniqueness, and its relation to N = 1 phenotypes. Whole blood samples from four African American women, four Caucasian women, and four Caucasian men drawn from the Atlanta Center for Health Discovery and Well Being study at three successive 6-month intervals were profiled by RNA-Seq, miRNA-Seq, and Illumina Methylation 450 K arrays. Standard regression approaches were used to evaluate the proportion of variance for each type of omic measure among individuals, and to quantify correlations among measures and with clinical attributes related to wellness. Longitudinal omic profiles were in general highly consistent over time, with an average of 67 % variance in transcript abundance, 42 % in CpG methylation level (but 88 % for the most differentiated CpG per gene), and 50 % in miRNA abundance among individuals, which are all comparable to 74 % variance among individuals for 74 clinical traits. One third of the variance could be attributed to differential blood cell type abundance, which was also fairly stable over time, and a lesser amount to expression quantitative trait loci (eQTL) effects. Seven conserved axes of covariance that capture diverse aspects of immune function explained over half of the variance. These axes also explained a considerable proportion of individually extreme transcript abundance, namely approximately 100 genes that were significantly up-regulated or down-regulated in each person and were in some cases enriched for relevant gene activities that plausibly associate with clinical attributes. A similar fraction of genes had individually divergent methylation levels, but these did not overlap with the transcripts, and fewer than 20 % of genes had significantly correlated methylation and gene expression. People express an "omic personality" consisting of peripheral blood transcriptional and epigenetic profiles that are constant over the course of a year and reflect various types of immune activity. Baseline genomic profiles can provide a window into the molecular basis of traits that might be useful for explaining medical conditions or guiding personalized health decisions.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Hu, Yanhui; Sopko, Richelle; Foos, Marianna; Kelley, Colleen; Flockhart, Ian; Ammeux, Noemie; Wang, Xiaowei; Perkins, Lizabeth; Perrimon, Norbert; Mohr, Stephanie E.
2013-01-01
The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database (www.flyrnai.org/flyprimerbank). At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo. PMID:23893746
Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping
2015-08-01
Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution. © The Author (2015). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, IBCB, SIBS, CAS. All rights reserved.
Feliu, Neus; Kohonen, Pekka; Ji, Jie; Zhang, Yuning; Karlsson, Hanna L; Palmberg, Lena; Nyström, Andreas; Fadeel, Bengt
2015-01-27
Gene expression profiling has developed rapidly in recent years with the advent of deep sequencing technologies such as RNA sequencing (RNA Seq) and could be harnessed to predict and define mechanisms of toxicity of chemicals and nanomaterials. However, the full potential of these technologies in (nano)toxicology is yet to be realized. Here, we show that systems biology approaches can uncover mechanisms underlying cellular responses to nanomaterials. Using RNA Seq and computational approaches, we found that cationic poly(amidoamine) dendrimers (PAMAM-NH2) are capable of triggering down-regulation of cell-cycle-related genes in primary human bronchial epithelial cells at doses that do not elicit acute cytotoxicity, as demonstrated using conventional cell viability assays, while gene transcription was not affected by neutral PAMAM-OH dendrimers. The PAMAMs were internalized in an active manner by lung cells and localized mainly in lysosomes; amine-terminated dendrimers were internalized more efficiently when compared to the hydroxyl-terminated dendrimers. Upstream regulator analysis implicated NF-κB as a putative transcriptional regulator, and subsequent cell-based assays confirmed that PAMAM-NH2 caused NF-κB-dependent cell cycle arrest. However, PAMAM-NH2 did not affect cell cycle progression in the human A549 adenocarcinoma cell line. These results demonstrate the feasibility of applying systems biology approaches to predict cellular responses to nanomaterials and highlight the importance of using relevant (primary) cell models.
Pine, P Scott; Munro, Sarah A; Parsons, Jerod R; McDaniel, Jennifer; Lucas, Anne Bergstrom; Lozach, Jean; Myers, Timothy G; Su, Qin; Jacobs-Helber, Sarah M; Salit, Marc
2016-06-24
Highly multiplexed assays for quantitation of RNA transcripts are being used in many areas of biology and medicine. Using data generated by these transcriptomic assays requires measurement assurance with appropriate controls. Methods to prototype and evaluate multiple RNA controls were developed as part of the External RNA Controls Consortium (ERCC) assessment process. These approaches included a modified Latin square design to provide a broad dynamic range of relative abundance with known differences between four complex pools of ERCC RNA transcripts spiked into a human liver total RNA background. ERCC pools were analyzed on four different microarray platforms: Agilent 1- and 2-color, Illumina bead, and NIAID lab-made spotted microarrays; and two different second-generation sequencing platforms: the Life Technologies 5500xl and the Illumina HiSeq 2500. Individual ERCC controls were assessed for reproducible performance in signal response to concentration among the platforms. Most demonstrated linear behavior if they were not located near one of the extremes of the dynamic range. Performance issues with any individual ERCC transcript could be attributed to detection limitations, platform-specific target probe issues, or potential mixing errors. Collectively, these pools of spike-in RNA controls were evaluated for suitability as surrogates for endogenous transcripts to interrogate the performance of the RNA measurement process of each platform. The controls were useful for establishing the dynamic range of the assay, as well as delineating the useable region of that range where differential expression measurements, expressed as ratios, would be expected to be accurate. The modified Latin square design presented here uses a composite testing scheme for the evaluation of multiple performance characteristics: linear performance of individual controls, signal response within dynamic range pools of controls, and ratio detection between pairs of dynamic range pools. This compact design provides an economical sample format for the evaluation of multiple external RNA controls within a single experiment per platform. These results indicate that well-designed pools of RNA controls, spiked into samples, provide measurement assurance for endogenous gene expression studies.
Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.
Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X
2017-12-05
Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.
Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud.
Yang, Andrian; Troup, Michael; Lin, Peijie; Ho, Joshua W K
2017-03-01
Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. j.ho@victorchang.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Missing data and technical variability in single-cell RNA-sequencing experiments.
Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A
2017-11-06
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
2014-01-01
Background Sm proteins are multimeric RNA-binding factors, found in all three domains of life. Eukaryotic Sm proteins, together with their associated RNAs, form small ribonucleoprotein (RNP) complexes important in multiple aspects of gene regulation. Comprehensive knowledge of the RNA components of Sm RNPs is critical for understanding their functions. Results We developed a multi-targeting RNA-immunoprecipitation sequencing (RIP-seq) strategy to reliably identify Sm-associated RNAs from Drosophila ovaries and cultured human cells. Using this method, we discovered three major categories of Sm-associated transcripts: small nuclear (sn)RNAs, small Cajal body (sca)RNAs and mRNAs. Additional RIP-PCR analysis showed both ubiquitous and tissue-specific interactions. We provide evidence that the mRNA-Sm interactions are mediated by snRNPs, and that one of the mechanisms of interaction is via base pairing. Moreover, the Sm-associated mRNAs are mature, indicating a splicing-independent function for Sm RNPs. Conclusions This study represents the first comprehensive analysis of eukaryotic Sm-containing RNPs, and provides a basis for additional functional analyses of Sm proteins and their associated snRNPs outside of the context of pre-mRNA splicing. Our findings expand the repertoire of eukaryotic Sm-containing RNPs and suggest new functions for snRNPs in mRNA metabolism. PMID:24393626
Rodríguez-Esteban, Gustavo; González-Sastre, Alejandro; Rojo-Laguna, José Ignacio; Saló, Emili; Abril, Josep F
2015-05-08
The freshwater planarian Schmidtea mediterranea is recognised as a valuable model for research into adult stem cells and regeneration. With the advent of the high-throughput sequencing technologies, it has become feasible to undertake detailed transcriptional analysis of its unique stem cell population, the neoblasts. Nonetheless, a reliable reference for this type of studies is still lacking. Taking advantage of digital gene expression (DGE) sequencing technology we compare all the available transcriptomes for S. mediterranea and improve their annotation. These results are accessible via web for the community of researchers. Using the quantitative nature of DGE, we describe the transcriptional profile of neoblasts and present 42 new neoblast genes, including several cancer-related genes and transcription factors. Furthermore, we describe in detail the Smed-meis-like gene and the three Nuclear Factor Y subunits Smed-nf-YA, Smed-nf-YB-2 and Smed-nf-YC. DGE is a valuable tool for gene discovery, quantification and annotation. The application of DGE in S. mediterranea confirms the planarian stem cells or neoblasts as a complex population of pluripotent and multipotent cells regulated by a mixture of transcription factors and cancer-related genes.
Nevil, Markus; Bondra, Eliana R.; Schulz, Katharine N.; Kaplan, Tommy; Harrison, Melissa M.
2017-01-01
It has been suggested that transcription factor binding is temporally dynamic, and that changes in binding determine transcriptional output. Nonetheless, this model is based on relatively few examples in which transcription factor binding has been assayed at multiple developmental stages. The essential transcription factor Grainy head (Grh) is conserved from fungi to humans, and controls epithelial development and barrier formation in numerous tissues. Drosophila melanogaster, which possess a single grainy head (grh) gene, provide an excellent system to study this conserved factor. To determine whether temporally distinct binding events allow Grh to control cell fate specification in different tissue types, we used a combination of ChIP-seq and RNA-seq to elucidate the gene regulatory network controlled by Grh during four stages of embryonic development (spanning stages 5–17) and in larval tissue. Contrary to expectations, we discovered that Grh remains bound to at least 1146 genomic loci over days of development. In contrast to this stable DNA occupancy, the subset of genes whose expression is regulated by Grh varies. Grh transitions from functioning primarily as a transcriptional repressor early in development to functioning predominantly as an activator later. Our data reveal that Grh binds to target genes well before the Grh-dependent transcriptional program commences, suggesting it sets the stage for subsequent recruitment of additional factors that execute stage-specific Grh functions. PMID:28007888
Comprehensive comparative analysis of 5'-end RNA-sequencing methods.
Adiconis, Xian; Haber, Adam L; Simmons, Sean K; Levy Moonshine, Ami; Ji, Zhe; Busby, Michele A; Shi, Xi; Jacques, Justin; Lancaster, Madeline A; Pan, Jen Q; Regev, Aviv; Levin, Joshua Z
2018-06-04
Specialized RNA-seq methods are required to identify the 5' ends of transcripts, which are critical for studies of gene regulation, but these methods have not been systematically benchmarked. We directly compared six such methods, including the performance of five methods on a single human cellular RNA sample and a new spike-in RNA assay that helps circumvent challenges resulting from uncertainties in annotation and RNA processing. We found that the 'cap analysis of gene expression' (CAGE) method performed best for mRNA and that most of its unannotated peaks were supported by evidence from other genomic methods. We applied CAGE to eight brain-related samples and determined sample-specific transcription start site (TSS) usage, as well as a transcriptome-wide shift in TSS usage between fetal and adult brain.
Transcriptional bursting is intrinsically caused by interplay between RNA polymerases on DNA
NASA Astrophysics Data System (ADS)
Fujita, Keisuke; Iwaki, Mitsuhiro; Yanagida, Toshio
2016-12-01
Cell-to-cell variability plays a critical role in cellular responses and decision-making in a population, and transcriptional bursting has been broadly studied by experimental and theoretical approaches as the potential source of cell-to-cell variability. Although molecular mechanisms of transcriptional bursting have been proposed, there is little consensus. An unsolved key question is whether transcriptional bursting is intertwined with many transcriptional regulatory factors or is an intrinsic characteristic of RNA polymerase on DNA. Here we design an in vitro single-molecule measurement system to analyse the kinetics of transcriptional bursting. The results indicate that transcriptional bursting is caused by interplay between RNA polymerases on DNA. The kinetics of in vitro transcriptional bursting is quantitatively consistent with the gene-nonspecific kinetics previously observed in noisy gene expression in vivo. Our kinetic analysis based on a cellular automaton model confirms that arrest and rescue by trailing RNA polymerase intrinsically causes transcriptional bursting.