Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.
Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida
2014-09-15
Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.
Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida
2016-03-15
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K
2016-01-01
In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.
Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias
2015-06-25
Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.
Dysregulated microRNA Activity in Shwachman-Diamond Syndrome
2016-09-01
define transcriptional signatures of bone marrow failure in SDS using single cell RNA -seq of patient cells. We will analyze these datasets to test the...microRNA expression profiles from HSPCs to be overlaid onto mRNA profiles. 15. SUBJECT TERMS Single cell RNA -seq; bone marrow failure; hematopoiesis...myelopoiesis; targeted RNA -seq 16. SECURITY CLASSIFICATION OF: U 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON
Rai, Muhammad Farooq; Tycksen, Eric D; Sandell, Linda J; Brophy, Robert H
2018-01-01
Microarrays and RNA-seq are at the forefront of high throughput transcriptome analyses. Since these methodologies are based on different principles, there are concerns about the concordance of data between the two techniques. The concordance of RNA-seq and microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed in clinically derived ligament tissues. To demonstrate the concordance between RNA-seq and microarrays and to assess potential benefits of RNA-seq over microarrays, we assessed differences in transcript expression in anterior cruciate ligament (ACL) tissues based on time-from-injury. ACL remnants were collected from patients with an ACL tear at the time of ACL reconstruction. RNA prepared from torn ACL remnants was subjected to Agilent microarrays (N = 24) and RNA-seq (N = 8). The correlation of biological replicates in RNA-seq and microarrays data was similar (0.98 vs. 0.97), demonstrating that each platform has high internal reproducibility. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarrays values were moderate. The cross-platform concordance for differentially expressed transcripts or enriched pathways was linearly correlated (r = 0.64). RNA-Seq was superior in detecting low abundance transcripts and differentiating biologically critical isoforms. Additional independent validation of transcript expression was undertaken using microfluidic PCR for selected genes. PCR data showed 100% concordance (in expression pattern) with RNA-seq and microarrays data. These findings demonstrate that RNA-seq has advantages over microarrays for transcriptome profiling of ligament tissues when available and affordable. Furthermore, these findings are likely transferable to other musculoskeletal tissues where tissue collection is challenging and cells are in low abundance. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:484-497, 2018. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.
An Integrated Approach for RNA-seq Data Normalization.
Yang, Shengping; Mercante, Donald E; Zhang, Kun; Fang, Zhide
2016-01-01
DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization. In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression. Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.
2013-01-01
Background High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user. Results Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or Pólya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that tweeDEseq yields P-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility. Conclusions RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.org. PMID:23965047
Chakraborty, Sutirtha
2018-05-26
RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.
Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.
2016-01-01
Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030
Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang
2015-01-01
Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants’ growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., ‘Photosynthesis’), GO terms (e.g., ‘response to karrikin’) and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology. PMID:25901577
Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang
2015-01-01
Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants' growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., 'Photosynthesis'), GO terms (e.g., 'response to karrikin') and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology.
Circular RNA profile in gliomas revealed by identification tool UROBORUS.
Song, Xiaofeng; Zhang, Naibo; Han, Ping; Moon, Byoung-San; Lai, Rose K; Wang, Kai; Lu, Wange
2016-05-19
Recent evidence suggests that many endogenous circular RNAs (circRNAs) may play roles in biological processes. However, the expression patterns and functions of circRNAs in human diseases are not well understood. Computationally identifying circRNAs from total RNA-seq data is a primary step in studying their expression pattern and biological roles. In this work, we have developed a computational pipeline named UROBORUS to detect circRNAs in total RNA-seq data. By applying UROBORUS to RNA-seq data from 46 gliomas and normal brain samples, we detected thousands of circRNAs supported by at least two read counts, followed by successful experimental validation on 24 circRNAs from the randomly selected 27 circRNAs. UROBORUS is an efficient tool that can detect circRNAs with low expression levels in total RNA-seq without RNase R treatment. The circRNAs expression profiling revealed more than 476 circular RNAs differentially expressed in control brain tissues and gliomas. Together with parental gene expression, we found that circRNA and its parental gene have diversified expression patterns in gliomas and control brain tissues. This study establishes an efficient and sensitive approach for predicting circRNAs using total RNA-seq data. The UROBORUS pipeline can be accessed freely for non-commercial purposes at http://uroborus.openbioinformatics.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.
Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B
2016-02-04
Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.
Kim, Kyu-Tae; Lee, Hye Won; Lee, Hae-Ock; Kim, Sang Cheol; Seo, Yun Jee; Chung, Woosung; Eum, Hye Hyeon; Nam, Do-Hyun; Kim, Junhyong; Joo, Kyeung Min; Park, Woong-Yang
2015-06-19
Intra-tumoral genetic and functional heterogeneity correlates with cancer clinical prognoses. However, the mechanisms by which intra-tumoral heterogeneity impacts therapeutic outcome remain poorly understood. RNA sequencing (RNA-seq) of single tumor cells can provide comprehensive information about gene expression and single-nucleotide variations in individual tumor cells, which may allow for the translation of heterogeneous tumor cell functional responses into customized anti-cancer treatments. We isolated 34 patient-derived xenograft (PDX) tumor cells from a lung adenocarcinoma patient tumor xenograft. Individual tumor cells were subjected to single cell RNA-seq for gene expression profiling and expressed mutation profiling. Fifty tumor-specific single-nucleotide variations, including KRAS(G12D), were observed to be heterogeneous in individual PDX cells. Semi-supervised clustering, based on KRAS(G12D) mutant expression and a risk score representing expression of 69 lung adenocarcinoma-prognostic genes, classified PDX cells into four groups. PDX cells that survived in vitro anti-cancer drug treatment displayed transcriptome signatures consistent with the group characterized by KRAS(G12D) and low risk score. Single-cell RNA-seq on viable PDX cells identified a candidate tumor cell subgroup associated with anti-cancer drug resistance. Thus, single-cell RNA-seq is a powerful approach for identifying unique tumor cell-specific gene expression profiles which could facilitate the development of optimized clinical anti-cancer strategies.
Lu, Jun; Bushel, Pierre R.
2013-01-01
RNA sequencing (RNA-Seq) allows for the identification of novel exon-exon junctions and quantification of gene expression levels. We show that from RNA-Seq data one may also detect utilization of alternative polyadenylation (APA) in 3′ untranslated regions (3′ UTRs) known to play a critical role in the regulation of mRNA stability, cellular localization and translation efficiency. Given the dynamic nature of APA, it is desirable to examine the APA on a sample by sample basis. We used a Poisson hidden Markov model (PHMM) of RNA-Seq data to identify potential APA in human liver and brain cortex tissues leading to shortened 3′ UTRs. Over three hundred transcripts with shortened 3′ UTRs were detected with sensitivity >75% and specificity >60%. tissue-specific 3′ UTR shortening was observed for 32 genes with a q-value ≤ 0.1. When compared to alternative isoforms detected by Cufflinks or MISO, our PHMM method agreed on over 100 transcripts with shortened 3′ UTRs. Given the increasing usage of RNA-Seq for gene expression profiling, using PHMM to investigate sample-specific 3′ UTR shortening could be an added benefit from this emerging technology. PMID:23845781
Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K
2016-01-01
RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.
2014-01-01
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838
Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.
Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping
2016-08-26
Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.
Transcriptional profiling of murine osteoblast differentiation based on RNA-seq expression analyses.
Khayal, Layal Abo; Grünhagen, Johannes; Provazník, Ivo; Mundlos, Stefan; Kornak, Uwe; Robinson, Peter N; Ott, Claus-Eric
2018-04-11
Osteoblastic differentiation is a multistep process characterized by osteogenic induction of mesenchymal stem cells, which then differentiate into proliferative pre-osteoblasts that produce copious amounts of extracellular matrix, followed by stiffening of the extracellular matrix, and matrix mineralization by hydroxylapatite deposition. Although these processes have been well characterized biologically, a detailed transcriptional analysis of murine primary calvaria osteoblast differentiation based on RNA sequencing (RNA-seq) analyses has not previously been reported. Here, we used RNA-seq to obtain expression values of 29,148 genes at four time points as murine primary calvaria osteoblasts differentiate in vitro until onset of mineralization was clearly detectable by microscopic inspection. Expression of marker genes confirmed osteogenic differentiation. We explored differential expression of 1386 protein-coding genes using unsupervised clustering and GO analyses. 100 differentially expressed lncRNAs were investigated by co-expression with protein-coding genes that are localized within the same topologically associated domain. Additionally, we monitored expression of 237 genes that are silent or active at distinct time points and compared differential exon usage. Our data represent an in-depth profiling of murine primary calvaria osteoblast differentiation by RNA-seq and contribute to our understanding of genetic regulation of this key process in osteoblast biology. Copyright © 2018 Elsevier Inc. All rights reserved.
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948
Zhao, Shanrong; Prenger, Kurt; Smith, Lance
2013-01-01
RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Mori, Yoshifumi; Chung, Ung-Il; Tanaka, Sakae; Saito, Taku
2014-01-01
Superficial zone (SFZ) cells, which are morphologically and functionally distinct from chondrocytes in deeper zones, play important roles in the maintenance of articular cartilage. Here, we established an easy and reliable method for performance of laser microdissection (LMD) on cryosections of mature rat articular cartilage using an adhesive membrane. We further examined gene expression profiles in the SFZ and the deeper zones of articular cartilage by performing RNA sequencing (RNA-seq). We validated sample collection methods, RNA amplification and the RNA-seq data using real-time RT-PCR. The combined data provide comprehensive information regarding genes specifically expressed in the SFZ or deeper zones, as well as a useful protocol for expression analysis of microsamples of hard tissues.
Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.
Grimes, Tyler; Walker, Alejandro R; Datta, Susmita; Datta, Somnath
2018-05-30
Neuroblastoma is the most common tumor of early childhood and is notorious for its high variability in clinical presentation. Accurate prognosis has remained a challenge for many patients. In this study, expression profiles from RNA-sequencing are used to predict survival times directly. Several models are investigated using various annotation levels of expression profiles (genes, transcripts, and introns), and an ensemble predictor is proposed as a heuristic for combining these different profiles. The use of RNA-seq data is shown to improve accuracy in comparison to using clinical data alone for predicting overall survival times. Furthermore, clinically high-risk patients can be subclassified based on their predicted overall survival times. In this effort, the best performing model was the elastic net using both transcripts and introns together. This model separated patients into two groups with 2-year overall survival rates of 0.40±0.11 (n=22) versus 0.80±0.05 (n=68). The ensemble approach gave similar results, with groups 0.42±0.10 (n=25) versus 0.82±0.05 (n=65). This suggests that the ensemble is able to effectively combine the individual RNA-seq datasets. Using predicted survival times based on RNA-seq data can provide improved prognosis by subclassifying clinically high-risk neuroblastoma patients. This article was reviewed by Subharup Guha and Isabel Nepomuceno.
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659
Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi
2018-02-12
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Reyes, Juan M; Chitwood, James L; Ross, Pablo J
2015-02-01
Molecular changes occurring during mammalian oocyte maturation are partly regulated by cytoplasmic polyadenylation (CP) and affect oocyte quality, yet the extent of CP activity during oocyte maturation remains unknown. Single bovine oocyte RNA sequencing (RNA-Seq) was performed to examine changes in transcript abundance during in vitro oocyte maturation in cattle. Polyadenylated RNA from individual germinal-vesicle and metaphase-II oocytes was amplified and processed for Illumina sequencing, producing approximately 30 million reads per replicate for each sample type. A total of 10,494 genes were found to be expressed, of which 2,455 were differentially expressed (adjusted P < 0.05 and fold change >2) between stages, with 503 and 1,952 genes respectively increasing and decreasing in abundance. Differentially expressed genes with complete 3'-untranslated-region sequence (279 increasing and 918 decreasing in polyadenylated transcript abundance) were examined for the presence, position, and distribution of motifs mediating CP, revealing enrichment (85%) and lack thereof (18%) in up- and down-regulated genes, respectively. Examination of total and polyadenylated RNA abundance by quantitative PCR validated these RNA-Seq findings. The observed increases in polyadenylated transcript abundance within the RNA-Seq data are likely due to CP, providing novel insight into targeted transcripts and resultant differential gene expression profiles that contribute to oocyte maturation. © 2015 Wiley Periodicals, Inc.
Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.
2011-01-01
Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623
Manteniotis, Stavros; Lehmann, Ramona; Flegel, Caroline; Vogel, Felix; Hofreuter, Adrian; Schreiner, Benjamin S. P.; Altmüller, Janine; Becker, Christian; Schöbel, Nicole; Hatt, Hanns; Gisselmann, Günter
2013-01-01
The specific functions of sensory systems depend on the tissue-specific expression of genes that code for molecular sensor proteins that are necessary for stimulus detection and membrane signaling. Using the Next Generation Sequencing technique (RNA-Seq), we analyzed the complete transcriptome of the trigeminal ganglia (TG) and dorsal root ganglia (DRG) of adult mice. Focusing on genes with an expression level higher than 1 FPKM (fragments per kilobase of transcript per million mapped reads), we detected the expression of 12984 genes in the TG and 13195 in the DRG. To analyze the specific gene expression patterns of the peripheral neuronal tissues, we compared their gene expression profiles with that of the liver, brain, olfactory epithelium, and skeletal muscle. The transcriptome data of the TG and DRG were scanned for virtually all known G-protein-coupled receptors (GPCRs) as well as for ion channels. The expression profile was ranked with regard to the level and specificity for the TG. In total, we detected 106 non-olfactory GPCRs and 33 ion channels that had not been previously described as expressed in the TG. To validate the RNA-Seq data, in situ hybridization experiments were performed for several of the newly detected transcripts. To identify differences in expression profiles between the sensory ganglia, the RNA-Seq data of the TG and DRG were compared. Among the differentially expressed genes (> 1 FPKM), 65 and 117 were expressed at least 10-fold higher in the TG and DRG, respectively. Our transcriptome analysis allows a comprehensive overview of all ion channels and G protein-coupled receptors that are expressed in trigeminal ganglia and provides additional approaches for the investigation of trigeminal sensing as well as for the physiological and pathophysiological mechanisms of pain. PMID:24260241
Early Detection of NSCLC Using Stromal Markers in Peripheral Blood
2016-09-01
circulating myeloid cells, flow cytometry, RNA -sequencing, expression profiling. 3. ACCOMPLISHMENTS: What were the major goals of the project...Subtask 2: Flow cytometry sorting of circulating myeloid cells. Subtask 3: RNA -Sequencing Subtask 4: RNA -seq data analysis Subtask 5: Feasible RT-PCR...accomplished the patient recruitment, flow cytometry sorting of circulating myeloid cells, RNA -sequencing of the samples. During the RNA - seq data analysis, we
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei
2017-05-19
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.
Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John
2017-10-03
Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-01-01
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755
Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José
2016-07-05
RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.
Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J
2015-09-03
RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.
Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq
Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim
2014-01-01
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
Single-nucleus RNA-seq of differentiating human myoblasts reveals the extent of fate heterogeneity
Zeng, Weihua; Jiang, Shan; Kong, Xiangduo; El-Ali, Nicole; Ball, Alexander R.; Ma, Christopher I-Hsing; Hashimoto, Naohiro; Yokomori, Kyoko; Mortazavi, Ali
2016-01-01
Myoblasts are precursor skeletal muscle cells that differentiate into fused, multinucleated myotubes. Current single-cell microfluidic methods are not optimized for capturing very large, multinucleated cells such as myotubes. To circumvent the problem, we performed single-nucleus transcriptome analysis. Using immortalized human myoblasts, we performed RNA-seq analysis of single cells (scRNA-seq) and single nuclei (snRNA-seq) and found them comparable, with a distinct enrichment for long non-coding RNAs (lncRNAs) in snRNA-seq. We then compared snRNA-seq of myoblasts before and after differentiation. We observed the presence of mononucleated cells (MNCs) that remained unfused and analyzed separately from multi-nucleated myotubes. We found that while the transcriptome profiles of myoblast and myotube nuclei are relatively homogeneous, MNC nuclei exhibited significant heterogeneity, with the majority of them adopting a distinct mesenchymal state. Primary transcripts for microRNAs (miRNAs) that participate in skeletal muscle differentiation were among the most differentially expressed lncRNAs, which we validated using NanoString. Our study demonstrates that snRNA-seq provides reliable transcriptome quantification for cells that are otherwise not amenable to current single-cell platforms. Our results further indicate that snRNA-seq has unique advantage in capturing nucleus-enriched lncRNAs and miRNA precursors that are useful in mapping and monitoring differential miRNA expression during cellular differentiation. PMID:27566152
Li, Yong-Fang; Mahalingam, Ramamurthy; Sunkar, Ramanjulu
2017-01-01
Alteration of gene expression is an essential mechanism, which allows plants to respond and adapt to adverse environmental conditions. Transcriptome and proteome analyses in plants exposed to abiotic stresses revealed that protein levels are not correlated with the changes in corresponding mRNAs, indicating regulation at translational level is another major regulator for gene expression. Analysis of translatome, which refers to all mRNAs associated with ribosomes, thus has the potential to bridge the gap between transcriptome and proteome. Polysomal RNA profiling and recently developed ribosome profiling (Ribo-seq) are two main methods for translatome analysis at global level. Here, we describe the classical procedure for polysomal RNA isolation by sucrose gradient ultracentrifugation followed by highthroughput RNA-seq to identify genes regulated at translational level. Polysomal RNA can be further used for a variety of downstream applications including Northern blot analysis, qRT-PCR, RNase protection assay, and microarray-based gene expression profiling.
Analytical workflow profiling gene expression in murine macrophages
Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.
2015-01-01
Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305
Gluck, Christian; Min, Sangwon; Oyelakin, Akinsola; Smalley, Kirsten; Sinha, Satrajit; Romano, Rose-Anne
2016-11-16
Mouse models have served a valuable role in deciphering various facets of Salivary Gland (SG) biology, from normal developmental programs to diseased states. To facilitate such studies, gene expression profiling maps have been generated for various stages of SG organogenesis. However these prior studies fall short of capturing the transcriptional complexity due to the limited scope of gene-centric microarray-based technology. Compared to microarray, RNA-sequencing (RNA-seq) offers unbiased detection of novel transcripts, broader dynamic range and high specificity and sensitivity for detection of genes, transcripts, and differential gene expression. Although RNA-seq data, particularly under the auspices of the ENCODE project, have covered a large number of biological specimens, studies on the SG have been lacking. To better appreciate the wide spectrum of gene expression profiles, we isolated RNA from mouse submandibular salivary glands at different embryonic and adult stages. In parallel, we processed RNA-seq data for 24 organs and tissues obtained from the mouse ENCODE consortium and calculated the average gene expression values. To identify molecular players and pathways likely to be relevant for SG biology, we performed functional gene enrichment analysis, network construction and hierarchal clustering of the RNA-seq datasets obtained from different stages of SG development and maturation, and other mouse organs and tissues. Our bioinformatics-based data analysis not only reaffirmed known modulators of SG morphogenesis but revealed novel transcription factors and signaling pathways unique to mouse SG biology and function. Finally we demonstrated that the unique SG gene signature obtained from our mouse studies is also well conserved and can demarcate features of the human SG transcriptome that is different from other tissues. Our RNA-seq based Atlas has revealed a high-resolution cartographic view of the dynamic transcriptomic landscape of the mouse SG at various stages. These RNA-seq datasets will complement pre-existing microarray based datasets, including the Salivary Gland Molecular Anatomy Project by offering a broader systems-biology based perspective rather than the classical gene-centric view. Ultimately such resources will be valuable in providing a useful toolkit to better understand how the diverse cell population of the SG are organized and controlled during development and differentiation.
Xie, Rangjin; Zhang, Jin; Ma, Yanyan; Pan, Xiaoting; Dong, Cuicui; Pang, Shaoping; He, Shaolan; Deng, Lie; Yi, Shilai; Zheng, Yongqiang; Lv, Qiang
2017-02-06
Citrus is one of the most economically important fruit crops around world. Drought and salinity stresses adversely affected its productivity and fruit quality. However, the genetic regulatory networks and signaling pathways involved in drought and salinity remain to be elucidated. With RNA-seq and sRNA-seq, an integrative analysis of miRNA and mRNA expression profiling and their regulatory networks were conducted using citrus roots subjected to dehydration and salt treatment. Differentially expressed (DE) mRNA and miRNA profiles were obtained according to fold change analysis and the relationships between miRNAs and target mRNAs were found to be coherent and incoherent in the regulatory networks. GO enrichment analysis revealed that some crucial biological processes related to signal transduction (e.g. 'MAPK cascade'), hormone-mediated signaling pathways (e.g. abscisic acid- activated signaling pathway'), reactive oxygen species (ROS) metabolic process (e.g. 'hydrogen peroxide catabolic process') and transcription factors (e.g., 'MYB, ZFP and bZIP') were involved in dehydration and/or salt treatment. The molecular players in response to dehydration and salt treatment were partially overlapping. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-seq and sRNA-seq analysis. This study provides new insights into the molecular mechanisms how citrus roots respond to dehydration and salt treatment.
Xie, Rangjin; Zhang, Jin; Ma, Yanyan; Pan, Xiaoting; Dong, Cuicui; Pang, Shaoping; He, Shaolan; Deng, Lie; Yi, Shilai; Zheng, Yongqiang; Lv, Qiang
2017-01-01
Citrus is one of the most economically important fruit crops around world. Drought and salinity stresses adversely affected its productivity and fruit quality. However, the genetic regulatory networks and signaling pathways involved in drought and salinity remain to be elucidated. With RNA-seq and sRNA-seq, an integrative analysis of miRNA and mRNA expression profiling and their regulatory networks were conducted using citrus roots subjected to dehydration and salt treatment. Differentially expressed (DE) mRNA and miRNA profiles were obtained according to fold change analysis and the relationships between miRNAs and target mRNAs were found to be coherent and incoherent in the regulatory networks. GO enrichment analysis revealed that some crucial biological processes related to signal transduction (e.g. ‘MAPK cascade’), hormone-mediated signaling pathways (e.g. abscisic acid- activated signaling pathway’), reactive oxygen species (ROS) metabolic process (e.g. ‘hydrogen peroxide catabolic process’) and transcription factors (e.g., ‘MYB, ZFP and bZIP’) were involved in dehydration and/or salt treatment. The molecular players in response to dehydration and salt treatment were partially overlapping. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-seq and sRNA-seq analysis. This study provides new insights into the molecular mechanisms how citrus roots respond to dehydration and salt treatment. PMID:28165059
Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.
Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S
2012-01-01
RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.
Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J
2018-05-29
Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
Prakash, Celine; Haeseler, Arndt Von
2017-03-01
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
Haeseler, Arndt Von
2017-01-01
Abstract RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. PMID:27661099
Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.
Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D
2015-07-30
RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.
Oh, Chun-do; Lu, Yue; Liang, Shoudan; Mori-Akiyama, Yuko; Chen, Di; de Crombrugghe, Benoit; Yasuda, Hideyo
2014-01-01
The transcription factor SOX9 plays an essential role in determining the fate of several cell types and is a master factor in regulation of chondrocyte development. Our aim was to determine which genes in the genome of chondrocytes are either directly or indirectly controlled by SOX9. We used RNA-Seq to identify genes whose expression levels were affected by SOX9 and used SOX9 ChIP-Seq to identify those genes that harbor SOX9-interaction sites. For RNA-Seq, the RNA expression profile of primary Sox9flox/flox mouse chondrocytes infected with Ad-CMV-Cre was compared with that of the same cells infected with a control adenovirus. Analysis of RNA-Seq data indicated that, when the levels of Sox9 mRNA were decreased more than 8-fold by infection with Ad-CMV-Cre, 196 genes showed a decrease in expression of at least 4-fold. These included many cartilage extracellular matrix (ECM) genes and a number of genes for ECM modification enzymes (transferases), membrane receptors, transporters, and others. In ChIP-Seq, 75% of the SOX9-interaction sites had a canonical inverted repeat motif within 100 bp of the top of the peak. SOX9-interaction sites were found in 55% of the genes whose expression was decreased more than 8-fold in SOX9-depleted cells and in somewhat fewer of the genes whose expression was reduced more than 4-fold, suggesting that these are direct targets of SOX9. The combination of RNA-Seq and ChIP-Seq has provided a fuller understanding of the SOX9-controlled genetic program of chondrocytes.
Zhang, Qun; Hu, Huan; Liu, Hongda; Jin, Jiajia; Zhu, Peiyuan; Wang, Shujun; Shen, Kaikai; Hu, Yangbo; Li, Zhou; Zhan, Ping; Zhu, Suhua; Fan, Hang; Zhang, Jianya; Lv, Tangfeng; Song, Yong
2018-05-29
Platelets are implicated as key players in the metastatic dissemination of tumor cells. Previous evidence demonstrated platelets retained cytoplasmic RNAs with physiologically activity, splicing pre-mRNA to mRNA and translating into functional proteins in response to external stimulation. Recently, platelets gene profile of healthy or diseased individuals were characterized with the help of RNA sequencing (RNA-Seq) in some studies, leading to new insights into the mechanisms underlying disease pathogenesis. In this study, we performed RNA-seq in platelets from 7 healthy individuals and 15 non-small cell lung cancer (NSCLC) patients. Our data revealed a subset of near universal differently expressed gene (DEG) profiles in platelets of metastatic NSCLC compared to healthy individuals, including 626 up-regulated RNAs (mRNAs and ncRNAs) and 1497 down-regulated genes. The significant over-expressed genes showed enrichment in focal adhesion, platelets activation, gap junction and adherens junction pathways. The DEGs also included previously reported tumor-related genes such as PDGFR, VEGF, EGF, etc., verifying the consistence and significance of platelet RNA-Seq in oncology study. We also validated several up-regulated DEGs involved in tumor cell-induced platelet aggregation (TCIPA) and tumorigenesis. Additionally, transcriptomic comparison analyses of NSCLC subgroups were conducted. Between non-metastatic and metastatic NSCLC patients, 526 platelet DEGs were identified with the most altered expression. The outcomes from subgroup analysis between lung adenocarcinoma and lung squamous cell carcinoma demonstrated the diagnostic potential of platelet RNA-Seq on distinguishing tumor histological types. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd
2013-01-01
An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815
Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.
2011-01-01
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452
Bai, Juan; Zhu, Ying; Dong, Ying
2018-06-01
Obesity is known to induce pathological changes in the gut and diets rich in complex carbohydrates that resist digestion in the small bowel can alter large bowel ecology. The purposes of this study were to identify the effects of bitter melon powder (BMP) on the global gene expression pattern in the colon mucosa of obese rats. Obese rats were fed a high-fat diet and treated without or with BMP for 8 weeks. Genome-wide expression profiles of the colon mucosa were determined by RNA sequencing (RNA-Seq) analysis at the end of experiment. A total of 87 genes were identified as differentially expressed (DE) between these two groups (fold change > 1.2). These results were further validated by quantitative RT-PCR, confirming the high reliability of the RNA-Seq. Interestingly, DE genes implicated in inflammation and lipid metabolism were found to be downregulated by BMP in the colon. Network between genes and the top 15 KEGG pathways showed that PRKCβ (protein kinase C beta) and Pla2g2a (phospholipase A2 group IIA) strongly interacted with surrounding pathways and genes. Results revealed that BMP supplement could remodel key colon functions by altering transcriptomic profile in obese rats.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.
Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin
2011-03-24
The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing
2011-01-01
Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219
Vukmirovic, Milica; Herazo-Maya, Jose D; Blackmon, John; Skodric-Trifunovic, Vesna; Jovanovic, Dragana; Pavlovic, Sonja; Stojsic, Jelena; Zeljkovic, Vesna; Yan, Xiting; Homer, Robert; Stefanovic, Branko; Kaminski, Naftali
2017-01-12
Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues. We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four. Our results demonstrate that RNA sequencing of RNA obtained from archived FFPE lung tissues is feasible. The results obtained from FFPE tissue are highly comparable to FF tissues. The ability to perform RNA-Seq on archived FFPE IPF tissues should greatly enhance the availability of tissue biopsies for research in IPF.
miR-MaGiC improves quantification accuracy for small RNA-seq.
Russell, Pamela H; Vestal, Brian; Shi, Wen; Rudra, Pratyaydipta D; Dowell, Robin; Radcliffe, Richard; Saba, Laura; Kechris, Katerina
2018-05-15
Many tools have been developed to profile microRNA (miRNA) expression from small RNA-seq data. These tools must contend with several issues: the small size of miRNAs, the small number of unique miRNAs, the fact that similar miRNAs can be transcribed from multiple loci, and the presence of miRNA isoforms known as isomiRs. Methods failing to address these issues can return misleading information. We propose a novel quantification method designed to address these concerns. We present miR-MaGiC, a novel miRNA quantification method, implemented as a cross-platform tool in Java. miR-MaGiC performs stringent mapping to a core region of each miRNA and defines a meaningful set of target miRNA sequences by collapsing the miRNA space to "functional groups". We hypothesize that these two features, mapping stringency and collapsing, provide more optimal quantification to a more meaningful unit (i.e., miRNA family). We test miR-MaGiC and several published methods on 210 small RNA-seq libraries, evaluating each method's ability to accurately reflect global miRNA expression profiles. We define accuracy as total counts close to the total number of input reads originating from miRNAs. We find that miR-MaGiC, which incorporates both stringency and collapsing, provides the most accurate counts.
Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich
2015-12-16
Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.
Buschmann, Dominik; Haberberger, Anna; Kirchner, Benedikt; Spornraft, Melanie; Riedmaier, Irmgard; Schelling, Gustav; Pfaffl, Michael W.
2016-01-01
Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions. PMID:27317696
Pankievicz, V C S; Camilios-Neto, D; Bonato, P; Balsanelli, E; Tadra-Sfeir, M Z; Faoro, H; Chubatsu, L S; Donatti, L; Wajnberg, G; Passetti, F; Monteiro, R A; Pedrosa, F O; Souza, E M
2016-04-01
Herbaspirillum seropedicae is a diazotrophic and endophytic bacterium that associates with economically important grasses promoting plant growth and increasing productivity. To identify genes related to bacterial ability to colonize plants, wheat seedlings growing hydroponically in Hoagland's medium were inoculated with H. seropedicae and incubated for 3 days. Total mRNA from the bacteria present in the root surface and in the plant medium were purified, depleted from rRNA and used for RNA-seq profiling. RT-qPCR analyses were conducted to confirm regulation of selected genes. Comparison of RNA profile of root attached and planktonic bacteria revealed extensive metabolic adaptations to the epiphytic life style. These adaptations include expression of specific adhesins and cell wall re-modeling to attach to the root. Additionally, the metabolism was adapted to the microxic environment and nitrogen-fixation genes were expressed. Polyhydroxybutyrate (PHB) synthesis was activated, and PHB granules were stored as observed by microscopy. Genes related to plant growth promotion, such as auxin production were expressed. Many ABC transporter genes were regulated in the bacteria attached to the roots. The results provide new insights into the adaptation of H. seropedicae to the interaction with the plant.
Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.
Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh
2018-01-01
Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.
Zouari, Inès; Salvioli, Alessandra; Chialva, Matteo; Novero, Mara; Miozzi, Laura; Tenore, Gian Carlo; Bagnaresi, Paolo; Bonfante, Paola
2014-03-21
Tomato (Solanum lycopersicum) establishes a beneficial symbiosis with arbuscular mycorrhizal (AM) fungi. The formation of the mycorrhizal association in the roots leads to plant-wide modulation of gene expression. To understand the systemic effect of the fungal symbiosis on the tomato fruit, we used RNA-Seq to perform global transcriptome profiling on Moneymaker tomato fruits at the turning ripening stage. Fruits were collected at 55 days after flowering, from plants colonized with Funneliformis mosseae and from control plants, which were fertilized to avoid responses related to nutrient deficiency. Transcriptome analysis identified 712 genes that are differentially expressed in fruits from mycorrhizal and control plants. Gene Ontology (GO) enrichment analysis of these genes showed 81 overrepresented functional GO classes. Up-regulated GO classes include photosynthesis, stress response, transport, amino acid synthesis and carbohydrate metabolism functions, suggesting a general impact of fungal symbiosis on primary metabolisms and, particularly, on mineral nutrition. Down-regulated GO classes include cell wall, metabolism and ethylene response pathways. Quantitative RT-PCR validated the RNA-Seq results for 12 genes out of 14 when tested at three fruit ripening stages, mature green, breaker and turning. Quantification of fruit nutraceutical and mineral contents produced values consistent with the expression changes observed by RNA-Seq analysis. This RNA-Seq profiling produced a novel data set that explores the intersection of mycorrhization and fruit development. We found that the fruits of mycorrhizal plants show two transcriptomic "signatures": genes characteristic of a climacteric fleshy fruit, and genes characteristic of mycorrhizal status, like phosphate and sulphate transporters. Moreover, mycorrhizal plants under low nutrient conditions produce fruits with a nutrient content similar to those from non-mycorrhizal plants under high nutrient conditions, indicating that AM fungi can help replace exogenous fertilizer for fruit crops.
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
Spliced synthetic genes as internal controls in RNA sequencing experiments.
Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R
2016-09-01
RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Steindorff, Andrei Stecca; Ramada, Marcelo Henrique Soller; Coelho, Alexandre Siqueira Guedes; Miller, Robert Neil Gerard; Pappas, Georgios Joannis; Ulhoa, Cirano José; Noronha, Eliane Ferreira
2014-03-18
The species of T. harzianum are well known for their biocontrol activity against plant pathogens. However, few studies have been conducted to further our understanding of its role as a biological control agent against S. sclerotiorum, a pathogen involved in several crop diseases around the world. In this study, we have used RNA-seq and quantitative real-time PCR (RT-qPCR) techniques in order to explore changes in T. harzianum gene expression during growth on cell wall of S. sclerotiorum (SSCW) or glucose. RT-qPCR was also used to examine genes potentially involved in biocontrol, during confrontation between T. harzianum and S. sclerotiorum. Data obtained from six RNA-seq libraries were aligned onto the T. harzianum CBS 226.95 reference genome and compared after annotation using the Blast2GO suite. A total of 297 differentially expressed genes were found in mycelia grown for 12, 24 and 36 h under the two different conditions: supplemented with glucose or SSCW. Functional annotation of these genes identified diverse biological processes and molecular functions required during T. harzianum growth on SSCW or glucose. We identified various genes of biotechnological value encoding proteins with functions such as transporters, hydrolytic activity, adherence, appressorium development and pathogenesis. To validate the expression profile, RT-qPCR was performed using 20 randomly chosen genes. RT-qPCR expression profiles were in complete agreement with the RNA-Seq data for 17 of the genes evaluated. The other three showed differences at one or two growth times. During the confrontation assay, some genes were up-regulated during and after contact, as shown in the presence of SSCW which is commonly used as a model to mimic this interaction. The present study is the first initiative to use RNA-seq for identification of differentially expressed genes in T. harzianum strain TR274, in response to the phytopathogenic fungus S. sclerotiorum. It provides insights into the mechanisms of gene expression involved in mycoparasitism of T. harzianum against S.sclerotiorum. The RNA-seq data presented will facilitate improvement of the annotation of gene models in the draft T. harzianum genome and provide important information regarding the transcriptome during this interaction.
RNA-Seq and molecular docking reveal multi-level pesticide resistance in the bed bug
2012-01-01
Background Bed bugs (Cimex lectularius) are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance. Results We performed a next-generation RNA sequencing (RNA-Seq) experiment to find differentially expressed genes between pesticide-resistant (PR) and pesticide-susceptible (PS) strains of C. lectularius. A reference transcriptome database of 51,492 expressed sequence tags (ESTs) was created by combining the databases derived from de novo assembled mRNA-Seq tags (30,404 ESTs) and our previous 454 pyrosequenced database (21,088 ESTs). The two-way GLMseq analysis revealed ~15,000 highly significant differentially expressed ESTs between the PR and PS strains. Among the top 5,000 differentially expressed ESTs, 109 putative defense genes (cuticular proteins, cytochrome P450s, antioxidant genes, ABC transporters, glutathione S-transferases, carboxylesterases and acetyl cholinesterase) involved in penetration resistance and metabolic resistance were identified. Tissue and development-specific expression of P450 CYP3 clan members showed high mRNA levels in the cuticle, Malpighian tubules, and midgut; and in early instar nymphs, respectively. Lastly, molecular modeling and docking of a candidate cytochrome P450 (CYP397A1V2) revealed the flexibility of the deduced protein to metabolize a broad range of insecticide substrates including DDT, deltamethrin, permethrin, and imidacloprid. Conclusions We developed significant molecular resources for C. lectularius putatively involved in metabolic resistance as well as those participating in other modes of insecticide resistance. RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide resistance in C. lectularius. Future research that is targeted towards RNA interference (RNAi) on the identified metabolic targets such as cytochrome P450s and cuticular proteins could lay the foundation for a better understanding of the genetic basis of insecticide resistance in C. lectularius. PMID:22226239
2010-01-01
Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097
2012-01-01
Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated. PMID:23016559
Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J; Pho, Mylan; Dei Rossi, Andrew; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre
2012-01-01
RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts.
Sinicropi, Dominick; Qu, Kunbin; Collin, Francois; Crager, Michael; Liu, Mei-Lan; Pelham, Robert J.; Pho, Mylan; Rossi, Andrew Dei; Jeong, Jennie; Scott, Aaron; Ambannavar, Ranjana; Zheng, Christina; Mena, Raul; Esteban, Jose; Stephans, James; Morlan, John; Baker, Joffre
2012-01-01
RNA biomarkers discovered by RT-PCR-based gene expression profiling of archival formalin-fixed paraffin-embedded (FFPE) tissue form the basis for widely used clinical diagnostic tests; however, RT-PCR is practically constrained in the number of transcripts that can be interrogated. We have developed and optimized RNA-Seq library chemistry as well as bioinformatics and biostatistical methods for whole transcriptome profiling from FFPE tissue. The chemistry accommodates low RNA inputs and sample multiplexing. These methods both enable rediscovery of RNA biomarkers for disease recurrence risk that were previously identified by RT-PCR analysis of a cohort of 136 patients, and also identify a high percentage of recurrence risk markers that were previously discovered using DNA microarrays in a separate cohort of patients, evidence that this RNA-Seq technology has sufficient precision and sensitivity for biomarker discovery. More than two thousand RNAs are strongly associated with breast cancer recurrence risk in the 136 patient cohort (FDR <10%). Many of these are intronic RNAs for which corresponding exons are not also associated with disease recurrence. A number of the RNAs associated with recurrence risk belong to novel RNA networks. It will be important to test the validity of these novel associations in whole transcriptome RNA-Seq screens of other breast cancer cohorts. PMID:22808097
Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E
2015-01-01
Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.
Ayyappan, Vasudevan; Kalavacharla, Venu; Thimmapuram, Jyothi; Bhide, Ketaki P; Sripathi, Venkateswara R; Smolinski, Tomasz G; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress.
Thimmapuram, Jyothi; Bhide, Ketaki P.; Sripathi, Venkateswara R.; Smolinski, Tomasz G.; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce
2015-01-01
Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress. PMID:26167691
Xia, Bin; Zou, Yang; Xu, Zhiling; Lv, Yonggang
2017-11-01
Low-intensity pulsed ultrasound (LIPUS) is a noninvasive technique that has been shown to affect cell proliferation, migration, and differentiation and promote the regeneration of damaged peripheral nerve. Our previous studies had proved that LIPUS can significantly promote the neural differentiation of induced pluripotent stem cell-derived neural crest stem cells (iPSCs-NCSCs) and enhance the repair of rat-transected sciatic nerve. To further explore the underlying mechanisms of LIPUS treatment of iPSCs-NCSCs, this study reported the gene expression profiling analysis of iPSCs-NCSCs before and after LIPUS treatment using the RNA-sequencing (RNA-Seq) method. It was found that expression of 76 genes of iPSCs-NCSCs cultured in a serum-free neural induction medium and expression of 21 genes of iPSCs-NCSCs cultured in a neuronal differentiation medium were significantly changed by LIPUS treatment. The differentially expressed genes are related to angiogenesis, nervous system activity and functions, cell activities, and so on. The RNA-seq results were further verified by a quantitative real-time reverse transcriptase polymerase chain reaction (qRT-PCR). High correlation was observed between the results obtained from qRT-PCR and RNA-Seq. This study presented new information on the global gene expression patterns of iPSCs-NCSCs after LIPUS treatment and may expand the understanding of the complex molecular mechanism of LIPUS treatment of iPSCs-NCSCs. © 2017 International Union of Biochemistry and Molecular Biology, Inc.
A Single-Cell Approach to the Elusive Latent Human Cytomegalovirus Transcriptome.
Goodrum, Felicia; McWeeney, Shannon
2018-06-12
Herpesvirus latency has been difficult to understand molecularly due to low levels of viral genomes and gene expression. In the case of the betaherpesvirus human cytomegalovirus (HCMV), this is further complicated by the heterogeneity inherent to hematopoietic subpopulations harboring genomes and, as a consequence, the various patterns of infection that simultaneously exist in a host, ranging from latent to lytic. Single-cell RNA sequencing (scRNA-seq) provides tremendous potential in measuring the gene expression profiles of heterogeneous cell populations for a wide range of applications, including in studies of cancer, immunology, and infectious disease. A recent study by Shnayder et al. (mBio 9:e00013-18, 2018, https://doi.org/10.1128/mBio.00013-18) utilized scRNA-seq to define transcriptomal characteristics of HCMV latency. They conclude that latency-associated gene expression is similar to the late lytic viral program but at lower levels of expression. The study highlights the numerous challenges, from the definition of latency to the analysis of scRNA-seq, that exist in defining a latent transcriptome. Copyright © 2018 Goodrum and McWeeney.
eQTL Mapping Using RNA-seq Data
Hu, Yijuan
2012-01-01
As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping. PMID:23667399
Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213
2012-01-01
Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis. PMID:23046475
Lee, Bradford W.; Kumar, Virender B.; Biswas, Pooja; Ko, Audrey C.; Alameddine, Ramzi M.; Granet, David B.; Ayyagari, Radha; Kikkawa, Don O.; Korn, Bobby S.
2018-01-01
Objective: This study utilized Next Generation Sequencing (NGS) to identify differentially expressed transcripts in orbital adipose tissue from patients with active Thyroid Eye Disease (TED) versus healthy controls. Method: This prospective, case-control study enrolled three patients with severe, active thyroid eye disease undergoing orbital decompression, and three healthy controls undergoing routine eyelid surgery with removal of orbital fat. RNA Sequencing (RNA-Seq) was performed on freshly obtained orbital adipose tissue from study patients to analyze the transcriptome. Bioinformatics analysis was performed to determine pathways and processes enriched for the differential expression profile. Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR) was performed to validate the differential expression of selected genes identified by RNA-Seq. Results: RNA-Seq identified 328 differentially expressed genes associated with active thyroid eye disease, many of which were responsible for mediating inflammation, cytokine signaling, adipogenesis, IGF-1 signaling, and glycosaminoglycan binding. The IL-5 and chemokine signaling pathways were highly enriched, and very-low-density-lipoprotein receptor activity and statin medications were implicated as having a potential role in TED. Conclusion: This study is the first to use RNA-Seq technology to elucidate differential gene expression associated with active, severe TED. This study suggests a transcriptional basis for the role of statins in modulating differentially expressed genes that mediate the pathogenesis of thyroid eye disease. Furthermore, the identification of genes with altered levels of expression in active, severe TED may inform the molecular pathways central to this clinical phenotype and guide the development of novel therapeutic agents. PMID:29760827
Lessons from single-cell transcriptome analysis of oxygen-sensing cells.
Zhou, Ting; Matsunami, Hiroaki
2018-05-01
The advent of single-cell RNA-sequencing (RNA-Seq) technology has enabled transcriptome profiling of individual cells. Comprehensive gene expression analysis at the single-cell level has proven to be effective in characterizing the most fundamental aspects of cellular function and identity. This unbiased approach is revolutionary for small and/or heterogeneous tissues like oxygen-sensing cells in identifying key molecules. Here, we review the major methods of current single-cell RNA-Seq technology. We discuss how this technology has advanced the understanding of oxygen-sensing glomus cells in the carotid body and helped uncover novel oxygen-sensing cells and mechanisms in the mice olfactory system. We conclude by providing our perspective on future single-cell RNA-Seq research directed at oxygen-sensing cells.
Circular RNA expression profiles and features in human tissues: a study using RNA-seq data.
Xu, Tianyi; Wu, Jing; Han, Ping; Zhao, Zhongming; Song, Xiaofeng
2017-10-03
Circular RNA (circRNA) is one type of noncoding RNA that forms a covalently closed continuous loop. Similar to long noncoding RNA (lncRNA), circRNA can act as microRNA (miRNA) 'sponges' to regulate gene expression, and its abnormal expression is related to diseases such as atherosclerosis, nervous system disorders and cancer. So far, there have been no systematic studies on circRNA abundance and expression profiles in human adult and fetal tissues. We explored circRNA expression profiles using RNA-seq data for six adult and fetal normal tissues (colon, heart, kidney, liver, lung, and stomach) and four gland normal tissues (adrenal gland, mammary gland, pancreas, and thyroid gland). A total of 8120, 25,933 and 14,433 circRNAs were detected by at least two supporting junction reads in adult, fetal and gland tissues, respectively. Among them, 3092, 14,241 and 6879 circRNAs were novel when compared to the published results. In each adult tissue type, we found at least 1000 circRNAs, among which 36.97-50.04% were tissue-specific. We reported 33 circRNAs that were ubiquitously expressed in all the adult tissues we examined. To further explore the potential "housekeeping" function of these circRNAs, we constructed a circRNA-miRNA-mRNA regulatory network containing 17 circRNAs, 22 miRNAs and 90 mRNAs. Furthermore, we found that both the abundance and the relative expression level of circRNAs were higher in fetal tissue than adult tissue. The number of circRNAs in gland tissues, especially in mammary gland (9665 circRNA candidates), was higher than that of other adult tissues (1160-3777). We systematically investigated circRNA expression in a variety of human adult and fetal tissues. Our observation of different expression level of circRNAs in adult and fetal tissues suggested that circRNAs might play their role in a tissue-specific and development-specific fashion. Analysis of circRNA-miRNA-mRNA network provided potential targets of circRNAs. High expression level of circRNAs in mammary gland might be attributed to the rich innervation.
A comprehensive simulation study on classification of RNA-Seq data.
Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet
2017-01-01
RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.
Wang, Zichen; Ma'ayan, Avi
2016-01-01
RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at: http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and https://hub.docker.com/r/maayanlab/zika/.
Spatial reconstruction of single-cell gene expression data.
Satija, Rahul; Farrell, Jeffrey A; Gennert, David; Schier, Alexander F; Regev, Aviv
2015-05-01
Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.
Spatial reconstruction of single-cell gene expression
Satija, Rahul; Farrell, Jeffrey A.; Gennert, David; Schier, Alexander F.; Regev, Aviv
2015-01-01
Spatial localization is a key determinant of cellular fate and behavior, but spatial RNA assays traditionally rely on staining for a limited number of RNA species. In contrast, single-cell RNA-seq allows for deep profiling of cellular gene expression, but established methods separate cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos, inferring a transcriptome-wide map of spatial patterning. We confirmed Seurat’s accuracy using several experimental approaches, and used it to identify a set of archetypal expression patterns and spatial markers. Additionally, Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems. PMID:25867923
Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E
2013-04-08
Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.
Meier, Jan; Hovestadt, Volker; Zapatka, Marc; Pscherer, Armin; Lichter, Peter; Seiffert, Martina
2013-01-01
MicroRNAs (miRNAs) are single-stranded, small, non-coding RNAs, which fine-tune protein expression by degrading and/or translationally inhibiting mRNAs. Manipulation of miRNA expression in animal models frequently results in severe phenotypes indicating their relevance in controlling cellular functions, most likely by interacting with multiple targets. To better understand the effect of miRNA activities, genome-wide analysis of their targets are required. MicroRNA profiling as well as transcriptome analysis upon enforced miRNA expression were frequently used to investigate their relevance. However, these approaches often fail to identify relevant miRNAs targets. Therefore, we tested the precision of RNA-interacting protein immunoprecipitation (RIP) using AGO2-specific antibodies, a core component of the “RNA-induced silencing complex” (RISC), followed by RNA sequencing (Seq) in a defined cellular system, the HEK293T cells with stable, ectopic expression of miR-155. Thereby, we identified 100 AGO2-associated mRNAs in miR-155-expressing cells, of which 67 were in silico predicted miR-155 target genes. An integrated analysis of the corresponding expression profiles indicated that these targets were either regulated by mRNA decay or by translational repression. Of the identified miR-155 targets, 17 were related to cell cycle control, suggesting their involvement in the observed increase in cell proliferation of HEK293T cells upon miR-155 expression. Additional, secondary changes within the gene expression profile were detected and might contribute to this phenotype as well. Interestingly, by analyzing RIP-Seq data of HEK-293T cells and two B-cell lines we identified a recurrent disproportional enrichment of several miRNAs, including miR-155 and miRNAs of the miR-17-92 cluster, in the AGO2-associated precipitates, suggesting discrepancies in miRNA expression and activity. PMID:23673373
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research
Warren, Andrew S.; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I.; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B.; Wattam, Alice R.; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-01-01
Motivation: RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. Results: RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. Availability and implementation: RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. Contact: anwarren@vt.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:25573919
RNA-Rocket: an RNA-Seq analysis resource for infectious disease research.
Warren, Andrew S; Aurrecoechea, Cristina; Brunk, Brian; Desai, Prerak; Emrich, Scott; Giraldo-Calderón, Gloria I; Harb, Omar; Hix, Deborah; Lawson, Daniel; Machi, Dustin; Mao, Chunhong; McClelland, Michael; Nordberg, Eric; Shukla, Maulik; Vosshall, Leslie B; Wattam, Alice R; Will, Rebecca; Yoo, Hyun Seung; Sobral, Bruno
2015-05-01
RNA-Seq is a method for profiling transcription using high-throughput sequencing and is an important component of many research projects that wish to study transcript isoforms, condition specific expression and transcriptional structure. The methods, tools and technologies used to perform RNA-Seq analysis continue to change, creating a bioinformatics challenge for researchers who wish to exploit these data. Resources that bring together genomic data, analysis tools, educational material and computational infrastructure can minimize the overhead required of life science researchers. RNA-Rocket is a free service that provides access to RNA-Seq and ChIP-Seq analysis tools for studying infectious diseases. The site makes available thousands of pre-indexed genomes, their annotations and the ability to stream results to the bioinformatics resources VectorBase, EuPathDB and PATRIC. The site also provides a combination of experimental data and metadata, examples of pre-computed analysis, step-by-step guides and a user interface designed to enable both novice and experienced users of RNA-Seq data. RNA-Rocket is available at rnaseq.pathogenportal.org. Source code for this project can be found at github.com/cidvbi/PathogenPortal. anwarren@vt.edu Supplementary materials are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
RNA-Seq for gene identification and transcript profiling of three Stevia rebaudiana genotypes.
Chen, Junwen; Hou, Kai; Qin, Peng; Liu, Hongchang; Yi, Bin; Yang, Wenting; Wu, Wei
2014-07-07
Stevia (Stevia rebaudiana) is an important medicinal plant that yields diterpenoid steviol glycosides (SGs). SGs are currently used in the preparation of medicines, food products and neutraceuticals because of its sweetening property (zero calories and about 300 times sweeter than sugar). Recently, some progress has been made in understanding the biosynthesis of SGs in Stevia, but little is known about the molecular mechanisms underlying this process. Additionally, the genomics of Stevia, a non-model species, remains uncharacterized. The recent advent of RNA-Seq, a next generation sequencing technology, provides an opportunity to expand the identification of Stevia genes through in-depth transcript profiling. We present a comprehensive landscape of the transcriptome profiles of three genotypes of Stevia with divergent SG compositions characterized using RNA-seq. 191,590,282 high-quality reads were generated and then assembled into 171,837 transcripts with an average sequence length of 969 base pairs. A total of 80,160 unigenes were annotated, and 14,211 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. Gene sequences of all enzymes known to be involved in SG synthesis were examined. A total of 143 UDP-glucosyltransferase (UGT) unigenes were identified, some of which might be involved in SG biosynthesis. The expression patterns of eight of these genes were further confirmed by RT-QPCR. RNA-seq analysis identified candidate genes encoding enzymes responsible for the biosynthesis of SGs in Stevia, a non-model plant without a reference genome. The transcriptome data from this study yielded new insights into the process of SG accumulation in Stevia. Our results demonstrate that RNA-Seq can be successfully used for gene identification and transcript profiling in a non-model species.
Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid
2017-02-02
Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.
Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq
Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng
2011-01-01
Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387
Bai, Xue; Zheng, Zhuqing; Liu, Bin; Ji, Xiaoyang; Bai, Yongsheng; Zhang, Wenguang
2016-08-22
The objective of this research was to investigate the variation of gene expression in the blood transcriptome profile of Chinese Holstein cows associated to the milk yield traits. We used RNA-seq to generate the bovine transcriptome from the blood of 23 lactating Chinese Holstein cows with extremely high and low milk yield. A total of 100 differentially expressed genes (DEGs) (p < 0.05, FDR < 0.05) were revealed between the high and low groups. Gene ontology (GO) analysis demonstrated that the 100 DEGs were enriched in specific biological processes with regard to defense response, immune response, inflammatory response, icosanoid metabolic process, and fatty acid metabolic process (p < 0.05). The KEGG pathway analysis with 100 DEGs revealed that the most statistically-significant metabolic pathway was related with Toll-like receptor signaling pathway (p < 0.05). The expression level of four selected DEGs was analyzed by qRT-PCR, and the results indicated that the expression patterns were consistent with the deep sequencing results by RNA-Seq. Furthermore, alternative splicing analysis of 100 DEGs demonstrated that there were different splicing pattern between high and low yielders. The alternative 3' splicing site was the major splicing pattern detected in high yielders. However, in low yielders the major type was exon skipping. This study provides a non-invasive method to identify the DEGs in cattle blood using RNA-seq for milk yield. The revealed 100 DEGs between Holstein cows with extremely high and low milk yield, and immunological pathway are likely involved in milk yield trait. Finally, this study allowed us to explore associations between immune traits and production traits related to milk production.
Identification of innate lymphoid cells in single-cell RNA-Seq data.
Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David
2017-07-01
Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.
Polyester: simulating RNA-seq datasets with differential transcript expression.
Frazee, Alyssa C; Jaffe, Andrew E; Langmead, Ben; Leek, Jeffrey T
2015-09-01
Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. Polyester is freely available from Bioconductor (http://bioconductor.org/). jtleek@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
This study aimed to compare oocyte gene expression profiles and follicular fluid (FF) content from overweight/obese (OW) women and normal weight (NW) women who were undergoing fertility treatments. Using single cell transcriptomic analyses, we investigated oocyte gene expression using RNA-seq. Serum...
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
Bradford, James R.; Farren, Matthew; Powell, Steve J.; Runswick, Sarah; Weston, Susie L.; Brown, Helen; Delpuech, Oona; Wappett, Mark; Smith, Neil R.; Carr, T. Hedley; Dry, Jonathan R.; Gibson, Neil J.; Barry, Simon T.
2013-01-01
Pre-clinical models of tumour biology often rely on propagating human tumour cells in a mouse. In order to gain insight into the alignment of these models to human disease segments or investigate the effects of different therapeutics, approaches such as PCR or array based expression profiling are often employed despite suffering from biased transcript coverage, and a requirement for specialist experimental protocols to separate tumour and host signals. Here, we describe a computational strategy to profile transcript expression in both the tumour and host compartments of pre-clinical xenograft models from the same RNA sample using RNA-Seq. Key to this strategy is a species-specific mapping approach that removes the need for manipulation of the RNA population, customised sequencing protocols, or prior knowledge of the species component ratio. The method demonstrates comparable performance to species-specific RT-qPCR and a standard microarray platform, and allowed us to quantify gene expression changes in both the tumour and host tissue following treatment with cediranib, a potent vascular endothelial growth factor receptor tyrosine kinase inhibitor, including the reduction of multiple murine transcripts associated with endothelium or vessels, and an increase in genes associated with the inflammatory response in response to cediranib. In the human compartment, we observed a robust induction of hypoxia genes and a reduction in cell cycle associated transcripts. In conclusion, the study establishes that RNA-Seq can be applied to pre-clinical models to gain deeper understanding of model characteristics and compound mechanism of action, and to identify both tumour and host biomarkers. PMID:23840389
Single-cell genomic profiling of acute myeloid leukemia for clinical use: A pilot study
Yan, Benedict; Hu, Yongli; Ban, Kenneth H.K.; Tiang, Zenia; Ng, Christopher; Lee, Joanne; Tan, Wilson; Chiu, Lily; Tan, Tin Wee; Seah, Elaine; Ng, Chin Hin; Chng, Wee-Joo; Foo, Roger
2017-01-01
Although bulk high-throughput genomic profiling studies have led to a significant increase in the understanding of cancer biology, there is increasing awareness that bulk profiling approaches do not completely elucidate tumor heterogeneity. Single-cell genomic profiling enables the distinction of tumor heterogeneity, and may improve clinical diagnosis through the identification and characterization of putative subclonal populations. In the present study, the challenges associated with a single-cell genomics profiling workflow for clinical diagnostics were investigated. Single-cell RNA-sequencing (RNA-seq) was performed on 20 cells from an acute myeloid leukemia bone marrow sample. Putative blasts were identified based on their gene expression profiles and principal component analysis was performed to identify outlier cells. Variant calling was performed on the single-cell RNA-seq data. The present pilot study demonstrates a proof of concept for clinical single-cell genomic profiling. The recognized limitations include significant stochastic RNA loss and the relatively low throughput of the current proposed platform. Although the results of the present study are promising, further technological advances and protocol optimization are necessary for single-cell genomic profiling to be clinically viable. PMID:28454300
A comparison of honeybee (Apis mellifera) queen, worker and drone larvae by RNA-Seq.
He, Xu-Jiang; Jiang, Wu-Jun; Zhou, Mi; Barron, Andrew B; Zeng, Zhi-Jiang
2017-11-06
Honeybees (Apis mellifera) have haplodiploid sex determination: males develop from unfertilized eggs and females develop from fertilized ones. The differences in larval food also determine the development of females. Here we compared the total somatic gene expression profiles of 2-day and 4-day-old drone, queen and worker larvae by RNA-Seq. The results from a co-expression network analysis on all expressed genes showed that 2-day-old drone and worker larvae were closer in gene expression profiles than 2-day-old queen larvae. This indicated that for young larvae (2-day-old) environmental factors such as larval diet have a greater effect on gene expression profiles than ploidy or sex determination. Drones had the most distinct gene expression profiles at the 4-day larval stage, suggesting that haploidy, or sex dramatically affects the gene expression of honeybee larvae. Drone larvae showed fewer differences in gene expression profiles at the 2-day and 4-day time points than the worker and queen larval comparisons (598 against 1190 and 1181), suggesting a different pattern of gene expression regulation during the larval development of haploid males compared to diploid females. This study indicates that early in development the queen caste has the most distinct gene expression profile, perhaps reflecting the very rapid growth and morphological specialization of this caste compared to workers and drones. Later in development the haploid male drones have the most distinct gene expression profile, perhaps reflecting the influence of ploidy or sex determination on gene expression. © 2017 Institute of Zoology, Chinese Academy of Sciences.
Gardeux, Vincent; David, Fabrice P. A.; Shajkofci, Adrian; Schwalie, Petra C.; Deplancke, Bart
2017-01-01
Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. Results We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. Availability and implementation The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. Contact bart.deplancke@epfl.ch Supplementary information Supplementary data are available at Bioinformatics online. PMID:28541377
Gardeux, Vincent; David, Fabrice P A; Shajkofci, Adrian; Schwalie, Petra C; Deplancke, Bart
2017-10-01
Single-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet these groups often lack the expertise to handle complex scRNA-seq datasets. We developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types. The tool is freely available at asap.epfl.ch and R/Python scripts are available at github.com/DeplanckeLab/ASAP. bart.deplancke@epfl.ch. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets
Macosko, Evan Z.; Basu, Anindita; Satija, Rahul; Nemesh, James; Shekhar, Karthik; Goldman, Melissa; Tirosh, Itay; Bialas, Allison R.; Kamitaki, Nolan; Martersteck, Emily M.; Trombetta, John J.; Weitz, David A.; Sanes, Joshua R.; Shalek, Alex K.; Regev, Aviv; McCarroll, Steven A.
2015-01-01
Summary Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of ease and scale have prevented its broad application. Here we describe Drop-Seq, a strategy for quickly profiling thousands of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell’s RNAs, and sequencing them all together. Drop-Seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts’ cell of origin. We analyzed transcriptomes from 44,808 mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes. Drop-Seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution. PMID:26000488
Analysis of miRNA expression profiles in melatonin-exposed GC-1 spg cell line.
Zhu, Xiaoling; Chen, Shuxiong; Jiang, Yanwen; Xu, Ying; Zhao, Yun; Chen, Lu; Li, Chunjin; Zhou, Xu
2018-02-05
Melatonin is an endocrine neurohormone secreted by pinealocytes in the pineal gland. It exerts diverse physiological effects, such as circadian rhythm regulator and antioxidant. However, the functional importance of melatonin in spermatogenesis regulation remains unclear. The objectives of this study are to: (1) detect melatonin affection on miRNA expression profiles in GC-1 spg cells by miRNA deep sequencing (DeepSeq) and (2) define melatonin affected miRNA-mRNA interactions and associated biological processes using bioinformatics analysis. GC-1 spg cells were cultured with melatonin (10 -7 M) for 24h. DeepSeq data were validated using quantitative real-time reverse transcription polymerase chain reaction analysis (qRT-PCR). A total of 176 miRNA expressions were found to be significantly different between two groups (fold change of >2 or <0.5 and FDR<0.05). Among these expressions, 171 were up-regulated, and 5 were down-regulated. Ontology analysis of biological processes of these targets indicated a variety of biological functions. Pathway analysis indicated that the predicted targets were involved in cancers, apoptosis and signaling pathways, such as VEGF, TNF, Ras and Notch. Results implicated that melatonin could regulate the expression of miRNA to perform its physiological effects in GC-1 spg cells. These results should be useful to investigate the biological function of miRNAs regulated by melatonin in spermatogenesis and testicular germ cell tumor. Copyright © 2017 Elsevier B.V. All rights reserved.
Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry
2018-06-25
The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.
2014-01-01
RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Missing data and technical variability in single-cell RNA-sequencing experiments.
Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A
2017-11-06
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Transformation and model choice for RNA-seq co-expression analysis.
Rau, Andrea; Maugis-Rabusseau, Cathy
2018-05-01
Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.
Single-cell gene expression analysis reveals diversity among human spermatogonia.
Neuhaus, N; Yoon, J; Terwort, N; Kliesch, S; Seggewiss, J; Huge, A; Voss, R; Schlatt, S; Grindberg, R V; Schöler, H R
2017-02-10
Is the molecular profile of human spermatogonia homogeneous or heterogeneous when analysed at the single-cell level? Heterogeneous expression profiles may be a key characteristic of human spermatogonia, supporting the existence of a heterogeneous stem cell population. Despite the fact that many studies have sought to identify specific markers for human spermatogonia, the molecular fingerprint of these cells remains hitherto unknown. Testicular tissues from patients with spermatogonial arrest (arrest, n = 1) and with qualitatively normal spermatogenesis (normal, n = 7) were selected from a pool of 179 consecutively obtained biopsies. Gene expression analyses of cell populations and single-cells (n = 105) were performed. Two OCT4-positive individual cells were selected for global transcriptional capture using shallow RNA-seq. Finally, expression of four candidate markers was assessed by immunohistochemistry. Histological analysis and blood hormone measurements for LH, FSH and testosterone were performed prior to testicular sample selection. Following enzymatic digestion of testicular tissues, differential plating and subsequent micromanipulation of individual cells was employed to enrich and isolate human spermatogonia, respectively. Endpoint analyses were qPCR analysis of cell populations and individual cells, shallow RNA-seq and immunohistochemical analyses. Unexpectedly, single-cell expression data from the arrest patient (20 cells) showed heterogeneous expression profiles. Also, from patients with normal spermatogenesis, heterogeneous expression patterns of undifferentiated (OCT4, UTF1 and MAGE A4) and differentiated marker genes (BOLL and PRM2) were obtained within each spermatogonia cluster (13 clusters with 85 cells). Shallow RNA-seq analysis of individual human spermatogonia was validated, and a spermatogonia-specific heterogeneous protein expression of selected candidate markers (DDX5, TSPY1, EEF1A1 and NGN3) was demonstrated. The heterogeneity of human spermatogonia at the RNA and protein levels is a snapshot. To further assess the functional meaning of this heterogeneity and the dynamics of stem cell populations, approaches need to be developed to facilitate the repeated analysis of individual cells. Our data suggest that heterogeneous expression profiles may be a key characteristic of human spermatogonia, supporting the model of a heterogeneous stem cell population. Future studies will assess the dynamics of spermatogonial populations in fertile and infertile patients. RNA-seq data is published in the GEO database: GSE91063. This work was supported by the Max Planck Society and the Deutsche Forschungsgemeinschaft DFG-Research Unit FOR 1041 Germ Cell Potential (grant numbers SCHO 340/7-1, SCHL394/11-2). The authors declare that there is no conflict of interest. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Classifying next-generation sequencing data using a zero-inflated Poisson model.
Zhou, Yan; Wan, Xiang; Zhang, Baoxue; Tong, Tiejun
2018-04-15
With the development of high-throughput techniques, RNA-sequencing (RNA-seq) is becoming increasingly popular as an alternative for gene expression analysis, such as RNAs profiling and classification. Identifying which type of diseases a new patient belongs to with RNA-seq data has been recognized as a vital problem in medical research. As RNA-seq data are discrete, statistical methods developed for classifying microarray data cannot be readily applied for RNA-seq data classification. Witten proposed a Poisson linear discriminant analysis (PLDA) to classify the RNA-seq data in 2011. Note, however, that the count datasets are frequently characterized by excess zeros in real RNA-seq or microRNA sequence data (i.e. when the sequence depth is not enough or small RNAs with the length of 18-30 nucleotides). Therefore, it is desired to develop a new model to analyze RNA-seq data with an excess of zeros. In this paper, we propose a Zero-Inflated Poisson Logistic Discriminant Analysis (ZIPLDA) for RNA-seq data with an excess of zeros. The new method assumes that the data are from a mixture of two distributions: one is a point mass at zero, and the other follows a Poisson distribution. We then consider a logistic relation between the probability of observing zeros and the mean of the genes and the sequencing depth in the model. Simulation studies show that the proposed method performs better than, or at least as well as, the existing methods in a wide range of settings. Two real datasets including a breast cancer RNA-seq dataset and a microRNA-seq dataset are also analyzed, and they coincide with the simulation results that our proposed method outperforms the existing competitors. The software is available at http://www.math.hkbu.edu.hk/∼tongt. xwan@comp.hkbu.edu.hk or tongt@hkbu.edu.hk. Supplementary data are available at Bioinformatics online.
Quantifying circular RNA expression from RNA-seq data using model-based framework.
Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun-A; Zhou, Tong; Gu, Wanjun
2017-07-15
Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.
Yip, Shun H; Sham, Pak Chung; Wang, Junwen
2018-02-21
Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.
NASA Astrophysics Data System (ADS)
Ma, Deyou; Yang, Hongsheng; Sun, Lina; Chen, Muyan
2014-01-01
Sea cucumbers Apostichopus japonicus are one of the most important aquaculture species in China. Their normal body color is black to fit their surroundings. Wild albinos are rare and hard to breed. To understand the differences between albino and normal (control) sea cucumbers at the transcriptional level, we sequenced the transcriptomes in their body-wall tissues using RNA-Seq high-throughput sequencing. Approximately 4.876 million (M) and 4.884 M 200-nucleotide-long cDNA reads were produced in the cDNA libraries derived from the body walls of albino and control samples, respectively. A total of 9 561 (46.89%) putative genes were identified from among the RNA-Seq reads in both libraries. After filtering, 837 significantly differentially regulated genes were identified in the albino library compared with in the control library, and 3.6% of the differentially expressed genes (DEGs) were found to have changed those more than five-fold. The expression levels of 10 DEGs were checked by real-time PCR and the results were in full accord with the RNA-Seq expression trends, although the amplitude of the differences in expression levels was lower in all cases. A series of pathways were significantly enriched for the DEGs. These pathways were closely related to phagocytosis, the complement and coagulation cascades, apoptosis-related diseases, cytokine-cytokine receptor interaction, and cell adhesion. The differences in gene expression and enriched pathways between the albino and control sea cucumbers offer control targets for cultivating excellent albino A. japonicus strains in the future.
Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui
2015-07-01
High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
Genomic and transcriptomic data on kiwifruit (Actinidia chinensis) in public databases are very limited despite its nutritional and economic value. Previously, we have constructed and sequenced nine fruit RNA-Seq libraries of A. chinensis cv. 'Hongyang' at immature, mature, and postharvest ripening...
Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia
2017-08-31
As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.
Nascent-Seq reveals novel features of mouse circadian transcriptional regulation
Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael
2012-01-01
A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795
Pantazatos, Spiro P.; Huang, Yung-yu; Rosoklija, Gorazd B.; Dwork, Andrew J.; Arango, Victoria; Mann, J. John
2016-01-01
Brain gene expression profiling studies of suicide and depression using oligonucleotide microarrays have often failed to distinguish these two phenotypes. Moreover, next generation sequencing (NGS) approaches are more accurate in quantifying gene expression and can detect alternative splicing. Using RNA-seq, we examined whole-exome gene and exon expression in non-psychiatric controls (CON, N=29), DSM-IV major depressive disorder suicides (MDD-S, N=21) and MDD non-suicides (MDD, N=9) in dorsal lateral prefrontal cortex (Brodmann Area 9) of sudden-death medication-free individuals postmortem. Using small RNA-seq, we also examined miRNA expression (9 samples per group). DeSeq2 identified thirty-five genes differentially expressed between groups and surviving adjustment for false discovery rate (adjusted p<0.1). In depression, altered genes include humanin like-8 (MTRNRL8), interleukin-8 (IL8), and serpin peptidase inhibitor, clade H (SERPINH1) and chemokine ligand 4 (CCL4), while exploratory gene ontology (GO) analyses revealed lower expression of immune-related pathways such as chemokine receptor activity, chemotaxis and cytokine biosynthesis, and angiogenesis and vascular development in (adjusted p<0.1). Hypothesis-driven GO analysis suggests lower expression of genes involved in oligodendrocyte differentiation, regulation of glutamatergic neurotransmission, and oxytocin receptor expression in both suicide and depression, and provisional evidence for altered DNA-dependent ATPase expression in suicide only. DEXSEq analysis identified differential exon usage in ATPase, class II, type 9B (adjusted p<0.1) in depression. Differences in miRNA expression or structural gene variants were not detected. Results lend further support for models in which deficits in microglial, endothelial (blood-brain barrier), ATPase activity and astrocytic cell functions contribute to MDD and suicide, and identify putative pathways and mechanisms for further study in these disorders. PMID:27528462
Pantazatos, S P; Huang, Y-Y; Rosoklija, G B; Dwork, A J; Arango, V; Mann, J J
2017-05-01
Brain gene expression profiling studies of suicide and depression using oligonucleotide microarrays have often failed to distinguish these two phenotypes. Moreover, next generation sequencing approaches are more accurate in quantifying gene expression and can detect alternative splicing. Using RNA-seq, we examined whole-exome gene and exon expression in non-psychiatric controls (CON, N=29), DSM-IV major depressive disorder suicides (MDD-S, N=21) and MDD non-suicides (MDD, N=9) in the dorsal lateral prefrontal cortex (Brodmann Area 9) of sudden death medication-free individuals post mortem. Using small RNA-seq, we also examined miRNA expression (nine samples per group). DeSeq2 identified 35 genes differentially expressed between groups and surviving adjustment for false discovery rate (adjusted P<0.1). In depression, altered genes include humanin-like-8 (MTRNRL8), interleukin-8 (IL8), and serpin peptidase inhibitor, clade H (SERPINH1) and chemokine ligand 4 (CCL4), while exploratory gene ontology (GO) analyses revealed lower expression of immune-related pathways such as chemokine receptor activity, chemotaxis and cytokine biosynthesis, and angiogenesis and vascular development in (adjusted P<0.1). Hypothesis-driven GO analysis suggests lower expression of genes involved in oligodendrocyte differentiation, regulation of glutamatergic neurotransmission, and oxytocin receptor expression in both suicide and depression, and provisional evidence for altered DNA-dependent ATPase expression in suicide only. DEXSEq analysis identified differential exon usage in ATPase, class II, type 9B (adjusted P<0.1) in depression. Differences in miRNA expression or structural gene variants were not detected. Results lend further support for models in which deficits in microglial, endothelial (blood-brain barrier), ATPase activity and astrocytic cell functions contribute to MDD and suicide, and identify putative pathways and mechanisms for further study in these disorders.
2012-01-01
Background Planarian stem cells, or neoblasts, drive the almost unlimited regeneration capacities of freshwater planarians. Neoblasts are traditionally described by their morphological features and by the fact that they are the only proliferative cell type in asexual planarians. Therefore, they can be specifically eliminated by irradiation. Irradiation, however, is likely to induce transcriptome-wide changes in gene expression that are not associated with neoblast ablation. This has affected the accurate description of their specific transcriptomic profile. Results We introduce the use of Smed-histone-2B RNA interference (RNAi) for genetic ablation of neoblast cells in Schmidtea mediterranea as an alternative to irradiation. We characterize the rapid, neoblast-specific phenotype induced by Smed-histone-2B RNAi, resulting in neoblast ablation. We compare and triangulate RNA-seq data after using both irradiation and Smed-histone-2B RNAi over a time course as means of neoblast ablation. Our analyses show that Smed-histone-2B RNAi eliminates neoblast gene expression with high specificity and discrimination from gene expression in other cellular compartments. We compile a high confidence list of genes downregulated by both irradiation and Smed-histone-2B RNAi and validate their expression in neoblast cells. Lastly, we analyze the overall expression profile of neoblast cells. Conclusions Our list of neoblast genes parallels their morphological features and is highly enriched for nuclear components, chromatin remodeling factors, RNA splicing factors, RNA granule components and the machinery of cell division. Our data reveal that the regulation of planarian stem cells relies on posttranscriptional regulatory mechanisms and suggest that planarians are an ideal model for this understudied aspect of stem cell biology. PMID:22439894
GWIPS-viz: development of a ribo-seq genome browser
Michel, Audrey M.; Fox, Gearoid; M. Kiran, Anmol; De Bo, Christof; O’Connor, Patrick B. F.; Heaphy, Stephen M.; Mullan, James P. A.; Donohue, Claire A.; Higgins, Desmond G.; Baranov, Pavel V.
2014-01-01
We describe the development of GWIPS-viz (http://gwips.ucc.ie), an online genome browser for viewing ribosome profiling data. Ribosome profiling (ribo-seq) is a recently developed technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome-protected messenger RNA (mRNA) fragments, which allows the ribosome density along all mRNA transcripts present in the cell to be quantified. Since its inception, ribo-seq has been carried out in a number of eukaryotic and prokaryotic organisms. Owing to the increasing interest in ribo-seq, there is a pertinent demand for a dedicated ribo-seq genome browser. GWIPS-viz is based on The University of California Santa Cruz (UCSC) Genome Browser. Ribo-seq tracks, coupled with mRNA-seq tracks, are currently available for several genomes: human, mouse, zebrafish, nematode, yeast, bacteria (Escherichia coli K12, Bacillus subtilis), human cytomegalovirus and bacteriophage lambda. Our objective is to continue incorporating published ribo-seq data sets so that the wider community can readily view ribosome profiling information from multiple studies without the need to carry out computational processing. PMID:24185699
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling
Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulatemore » gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.« less
2013-01-01
Background Microalgae can make a significant contribution towards meeting global renewable energy needs in both carbon-based and hydrogen (H2) biofuel. The development of energy-related products from algae could be accelerated with improvements in systems biology tools, and recent advances in sequencing technology provide a platform for enhanced transcriptomic analyses. However, these techniques are still heavily reliant upon available genomic sequence data. Chlamydomonas moewusii is a unicellular green alga capable of evolving molecular H2 under both dark and light anaerobic conditions, and has high hydrogenase activity that can be rapidly induced. However, to date, there is no systematic investigation of transcriptomic profiling during induction of H2 photoproduction in this organism. Results In this work, RNA-Seq was applied to investigate transcriptomic profiles during the dark anaerobic induction of H2 photoproduction. 156 million reads generated from 7 samples were then used for de novo assembly after data trimming. BlastX results against NCBI database and Blast2GO results were used to interpret the functions of the assembled 34,136 contigs, which were then used as the reference contigs for RNA-Seq analysis. Our results indicated that more contigs were differentially expressed during the period of early and higher H2 photoproduction, and fewer contigs were differentially expressed when H2-photoproduction rates decreased. In addition, C. moewusii and C. reinhardtii share core functional pathways, and transcripts for H2 photoproduction and anaerobic metabolite production were identified in both organisms. C. moewusii also possesses similar metabolic flexibility as C. reinhardtii, and the difference between C. moewusii and C. reinhardtii on hydrogenase expression and anaerobic fermentative pathways involved in redox balancing may explain their different profiles of hydrogenase activity and secreted anaerobic metabolites. Conclusions Herein, we have described a workflow using commercial software to analyze RNA-Seq data without reference genome sequence information, which can be applied to other unsequenced microorganisms. This study provided biological insights into the anaerobic fermentation and H2 photoproduction of C. moewusii, and the first transcriptomic RNA-Seq dataset of C. moewusii generated in this study also offer baseline data for further investigation (e.g. regulatory proteins related to fermentative pathway discussed in this study) of this organism as a H2-photoproduction strain. PMID:23971877
Yatsu, Ryohei; Miyagawa, Shinichi; Kohno, Satomi; Parrott, Benjamin B; Yamaguchi, Katsushi; Ogino, Yukiko; Miyakawa, Hitoshi; Lowers, Russell H; Shigenobu, Shuji; Guillette, Louis J; Iguchi, Taisen
2016-01-25
The American alligator (Alligator mississippiensis) displays temperature-dependent sex determination (TSD), in which incubation temperature during embryonic development determines the sexual fate of the individual. However, the molecular mechanisms governing this process remain a mystery, including the influence of initial environmental temperature on the comprehensive gonadal gene expression patterns occurring during TSD. Our characterization of transcriptomes during alligator TSD allowed us to identify novel candidate genes involved in TSD initiation. High-throughput RNA sequencing (RNA-seq) was performed on gonads collected from A. mississippiensis embryos incubated at both a male and a female producing temperature (33.5 °C and 30 °C, respectively) in a time series during sexual development. RNA-seq yielded 375.2 million paired-end reads, which were mapped and assembled, and used to characterize differential gene expression. Changes in the transcriptome occurring as a function of both development and sexual differentiation were extensively profiled. Forty-one differentially expressed genes were detected in response to incubation at male producing temperature, and included genes such as Wnt signaling factor WNT11, histone demethylase KDM6B, and transcription factor C/EBPA. Furthermore, comparative analysis of development- and sex-dependent differential gene expression revealed 230 candidate genes involved in alligator sex determination and differentiation, and early details of the suspected male-fate commitment were profiled. We also discovered sexually dimorphic expression of uncharacterized ncRNAs and other novel elements, such as unique expression patterns of HEMGN and ARX. Twenty-five of the differentially expressed genes identified in our analysis were putative transcriptional regulators, among which were MYBL2, MYCL, and HOXC10, in addition to conventional sex differentiation genes such as SOX9, and FOXL2. Inferred gene regulatory network was constructed, and the gene-gene and temperature-gene interactions were predicted. Gonadal global gene expression kinetics during sex determination has been extensively profiled for the first time in a TSD species. These findings provide insights into the genetic framework underlying TSD, and expand our current understanding of the developmental fate pathways during vertebrate sex determination.
Pflueger, Dorothee; Sboner, Andrea; Storz, Martina; Roth, Jasmine; Compérat, Eva; Bruder, Elisabeth; Rubin, Mark A; Schraml, Peter; Moch, Holger
2013-11-01
TFE3 translocation renal cell carcinoma (tRCC) is defined by chromosomal translocations involving the TFE3 transcription factor at chromosome Xp11.2. Genetically proven TFE3 tRCCs have a broad histologic spectrum with overlapping features to other renal tumor subtypes. In this study, we aimed for characterizing RCC with TFE3 protein expression. Using next-generation whole transcriptome sequencing (RNA-Seq) as a discovery tool, we analyzed fusion transcripts, gene expression profile, and somatic mutations in frozen tissue of one TFE3 tRCC. By applying a computational analysis developed to call chimeric RNA molecules from paired-end RNA-Seq data, we confirmed the known TFE3 translocation. Its fusion partner SFPQ has already been described as fusion partner in tRCCs. In addition, an RNA read-through chimera between TMED6 and COG8 as well as MET and KDR (VEGFR2) point mutations were identified. An EGFR mutation, but no chromosomal rearrangements, was identified in a control group of five clear cell RCCs (ccRCCs). The TFE3 tRCC could be clearly distinguished from the ccRCCs by RNA-Seq gene expression measurements using a previously reported tRCC gene signature. In validation experiments using reverse transcription-PCR, TMED6-COG8 chimera expression was significantly higher in nine TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in 24 ccRCCs (P < .001) and 22 papillary RCCs (P < .05-.07). Immunohistochemical analysis of selected genes from the tRCC gene signature showed significantly higher eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) and Contactin 3 (CNTN3) expression in 16 TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in over 200 ccRCCs (P < .0001, both).
Functional regression method for whole genome eQTL epistasis analysis with sequencing data.
Xu, Kelin; Jin, Li; Xiong, Momiao
2017-05-18
Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages
Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles
2014-01-01
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058
A Guide for Designing and Analyzing RNA-Seq Data.
Chatterjee, Aniruddha; Ahn, Antonio; Rodger, Euan J; Stockwell, Peter A; Eccles, Michael R
2018-01-01
The identity of a cell or an organism is at least in part defined by its gene expression and therefore analyzing gene expression remains one of the most frequently performed experimental techniques in molecular biology. The development of the RNA-Sequencing (RNA-Seq) method allows an unprecedented opportunity to analyze expression of protein-coding, noncoding RNA and also de novo transcript assembly of a new species or organism. However, the planning and design of RNA-Seq experiments has important implications for addressing the desired biological question and maximizing the value of the data obtained. In addition, RNA-Seq generates a huge volume of data and accurate analysis of this data involves several different steps and choices of tools. This can be challenging and overwhelming, especially for bench scientists. In this chapter, we describe an entire workflow for performing RNA-Seq experiments. We describe critical aspects of wet lab experiments such as RNA isolation, library preparation and the initial design of an experiment. Further, we provide a step-by-step description of the bioinformatics workflow for different steps involved in RNA-Seq data analysis. This includes power calculations, setting up a computational environment, acquisition and processing of publicly available data if desired, quality control measures, preprocessing steps for the raw data, differential expression analysis, and data visualization. We particularly mention important considerations for each step to provide a guide for designing and analyzing RNA-Seq data.
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
Gierahn, Todd M; Wadsworth, Marc H; Hughes, Travis K; Bryson, Bryan D; Butler, Andrew; Satija, Rahul; Fortune, Sarah; Love, J Christopher; Shalek, Alex K
2017-04-01
Single-cell RNA-seq can precisely resolve cellular states, but applying this method to low-input samples is challenging. Here, we present Seq-Well, a portable, low-cost platform for massively parallel single-cell RNA-seq. Barcoded mRNA capture beads and single cells are sealed in an array of subnanoliter wells using a semipermeable membrane, enabling efficient cell lysis and transcript capture. We use Seq-Well to profile thousands of primary human macrophages exposed to Mycobacterium tuberculosis.
Mining the archives: a cross-platform analysis of gene ...
Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc
Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming
2013-01-01
Background Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. Results In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. Conclusions This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas. PMID:24349370
Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming
2013-01-01
Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas.
Visual Display of 5p-arm and 3p-arm miRNA Expression with a Mobile Application.
Pan, Chao-Yu; Kuo, Wei-Ting; Chiu, Chien-Yuan; Lin, Wen-Chang
2017-01-01
MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.
RNA-Seq Mouse Brain Regions Expression Data Analysis: Focus on ApoE Functional Network
Babenko, Vladimir N; Smagin, Dmitry A; Kudryavtseva, Natalia N
2017-09-13
ApoE expression status was proved to be a highly specific marker of energy metabolism rate in the brain. Along with its neighbor, Translocase of Outer Mitochondrial Membrane 40 kDa (TOMM40) which is involved in mitochondrial metabolism, the corresponding genomic region constitutes the neuroenergetic hotspot. Using RNA-Seq data from a murine model of chronic stress a significant positive expression coordination of seven neighboring genes in ApoE locus in five brain regions was observed. ApoE maintains one of the highest absolute expression values genome-wide, implying that ApoE can be the driver of the neighboring gene expression alteration observed under stressful loads. Notably, we revealed the highly statistically significant increase of ApoE expression in the hypothalamus of chronically aggressive (FDR < 0.007) and defeated (FDR < 0.001) mice compared to the control. Correlation analysis revealed a close association of ApoE and proopiomelanocortin (Pomc) gene expression profiles implying the putative neuroendocrine stress response background of ApoE expression elevation therein.
Hsu, Han-Hsiu; Araki, Michihiro; Mochizuki, Masao; Hori, Yoshimi; Murata, Masahiro; Kahar, Prihardi; Yoshida, Takanobu; Hasunuma, Tomohisa; Kondo, Akihiko
2017-03-02
Chinese hamster ovary (CHO) cells are the primary host used for biopharmaceutical protein production. The engineering of CHO cells to produce higher amounts of biopharmaceuticals has been highly dependent on empirical approaches, but recent high-throughput "omics" methods are changing the situation in a rational manner. Omics data analyses using gene expression or metabolite profiling make it possible to identify key genes and metabolites in antibody production. Systematic omics approaches using different types of time-series data are expected to further enhance understanding of cellular behaviours and molecular networks for rational design of CHO cells. This study developed a systematic method for obtaining and analysing time-dependent intracellular and extracellular metabolite profiles, RNA-seq data (enzymatic mRNA levels) and cell counts from CHO cell cultures to capture an overall view of the CHO central metabolic pathway (CMP). We then calculated correlation coefficients among all the profiles and visualised the whole CMP by heatmap analysis and metabolic pathway mapping, to classify genes and metabolites together. This approach provides an efficient platform to identify key genes and metabolites in CHO cell culture.
2012-01-01
Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M
2012-09-17
RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
Kamber, Tim; Buchmann, Jan P; Pothier, Joël F; Smits, Theo H M; Wicker, Thomas; Duffy, Brion
2016-02-17
The molecular basis of resistance and susceptibility of host plants to fire blight, a major disease threat to pome fruit production globally, is largely unknown. RNA-sequencing data from challenged and mock-inoculated flowers were analyzed to assess the susceptible response of apple to the fire blight pathogen Erwinia amylovora. In presence of the pathogen 1,080 transcripts were differentially expressed at 48 h post inoculation. These included putative disease resistance, stress, pathogen related, general metabolic, and phytohormone related genes. Reads, mapped to regions on the apple genome where no genes were assigned, were used to identify potential novel genes and open reading frames. To identify transcripts specifically expressed in response to E. amylovora, RT-PCRs were conducted and compared to the expression patterns of the fire blight biocontrol agent Pantoea vagans strain C9-1, another apple pathogen Pseudomonas syringae pv. papulans, and mock inoculated apple flowers. This led to the identification of a peroxidase superfamily gene that was lower expressed in response to E. amylovora suggesting a potential role in the susceptibility response. Overall, this study provides the first transcriptional profile by RNA-seq of the host plant during fire blight disease and insights into the response of susceptible apple plants to E. amylovora.
Kamber, Tim; Buchmann, Jan P.; Pothier, Joël F.; Smits, Theo H. M.; Wicker, Thomas; Duffy, Brion
2016-01-01
The molecular basis of resistance and susceptibility of host plants to fire blight, a major disease threat to pome fruit production globally, is largely unknown. RNA-sequencing data from challenged and mock-inoculated flowers were analyzed to assess the susceptible response of apple to the fire blight pathogen Erwinia amylovora. In presence of the pathogen 1,080 transcripts were differentially expressed at 48 h post inoculation. These included putative disease resistance, stress, pathogen related, general metabolic, and phytohormone related genes. Reads, mapped to regions on the apple genome where no genes were assigned, were used to identify potential novel genes and open reading frames. To identify transcripts specifically expressed in response to E. amylovora, RT-PCRs were conducted and compared to the expression patterns of the fire blight biocontrol agent Pantoea vagans strain C9-1, another apple pathogen Pseudomonas syringae pv. papulans, and mock inoculated apple flowers. This led to the identification of a peroxidase superfamily gene that was lower expressed in response to E. amylovora suggesting a potential role in the susceptibility response. Overall, this study provides the first transcriptional profile by RNA-seq of the host plant during fire blight disease and insights into the response of susceptible apple plants to E. amylovora. PMID:26883568
Analysis, annotation, and profiling of the oat seed transcriptome
USDA-ARS?s Scientific Manuscript database
Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.
Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin
2013-09-22
High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.
RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.
Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R
2013-01-01
Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq's capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma.
Xue, Linlin; Xie, Li; Song, Xingguo; Song, Xianrang
2018-04-17
Platelets have emerged as key players in tumorigenesis and tumor progression. Tumor-educated platelet (TEP) RNA profile has the potential to diagnose non-small-cell lung cancer (NSCLC). The objective of this study was to identify potential TEP RNA biomarkers for the diagnosis of NSCLC and to explore the mechanisms in alternations of TEP RNA profile. The RNA-seq datasets GSE68086 and GSE89843 were downloaded from Gene Expression Omnibus DataSets (GEO DataSets). Then, the functional enrichment of the differentially expressed mRNAs was analyzed by the Database for Annotation Visualization and Integrated Discovery (DAVID). The miRNAs which regulated the differential mRNAs and the target mRNAs of miRNAs were identified by miRanda and miRDB. Then, the miRNA-mRNA regulatory network was visualized via Cytoscape software. Twenty consistently altered mRNAs (2 up-regulated and 18 down-regulated) were identified from the two GSE datasets, and they were significantly enriched in several biological processes, including transport and establishment of localization. Twenty identical miRNAs were found between exosomal miRNA-seq dataset and 229 miRNAs that regulated 20 consistently differential mRNAs in platelets. We also analyzed 13 spliceosomal mRNAs and their miRNA predictions; there were 27 common miRNAs between 206 differential exosomal miRNAs and 338 miRNAs that regulated 13 distinct spliceosomal mRNAs. This study identified 20 potential TEP RNA biomarkers in NSCLC for diagnosis by integrated bioinformatical analysis, and alternations in TEP RNA profile may be related to the post-transcriptional regulation and the splicing metabolisms of spliceosome. © 2018 Wiley Periodicals, Inc.
Shi, Jiandong; Sun, Jing; Wu, Meini; Wang, Haixuan; Hu, Ningzhu; Hu, Yunzhang
2016-11-01
Hepatitis A virus (HAV), the causative agent of acute hepatitis, grows slowly without causing any cytopathic effect (CPE) and lead to a persistent infection in the fibroblasts in vitro. miRNAs play a key role in the viral pathogenesis and virus-host interactions. In this study, the comprehensive miRNA expression profiles of HAV-infected and uninfected fibroblasts were investigated by sRNA-seq and validated by RT-qPCR. The results showed that a total of 94 miRNAs were differentially expressed during HAV infection, including 11 up-regulated miRNAs and 83 down-regulated miRNAs. RT-qPCR analysis showed the expression levels of specific miRNAs were consistent with sRNA-seq data. Further, target prediction analysis showed 729 putative target genes that included many immune-related transcripts were revealed. The GO enrichment analysis and the KEGG pathway analysis of the target genes showed that various biological pathways, including JAK-STAT cascade, type I interferon signaling pathway could be affected by HAV infection by the alteration of host miRNAs. The core regulatory relationship between miRNAs and their targets were revealed by miRNA-gene-network. Collectively, this study provides an overall analysis of miRNA profile in cell culture infected with HAV. The present results imply the alteration of miRNAs expression induced by HAV infection which may be related to the establishment of persistent HAV infection and might provide new clues for understanding the persistent HAV infections in vitro and the unique biological characteristics associated with HAV during infection. Copyright © 2016 Elsevier B.V. All rights reserved.
Dobon, Albor; Bunting, Daniel C E; Cabrera-Quio, Luis Enrique; Uauy, Cristobal; Saunders, Diane G O
2016-05-20
Understanding how plants and pathogens modulate gene expression during the host-pathogen interaction is key to uncovering the molecular mechanisms that regulate disease progression. Recent advances in sequencing technologies have provided new opportunities to decode the complexity of such interactions. In this study, we used an RNA-based sequencing approach (RNA-seq) to assess the global expression profiles of the wheat yellow rust pathogen Puccinia striiformis f. sp. tritici (PST) and its host during infection. We performed a detailed RNA-seq time-course for a susceptible and a resistant wheat host infected with PST. This study (i) defined the global gene expression profiles for PST and its wheat host, (ii) substantially improved the gene models for PST, (iii) evaluated the utility of several programmes for quantification of global gene expression for PST and wheat, and (iv) identified clusters of differentially expressed genes in the host and pathogen. By focusing on components of the defence response in susceptible and resistant hosts, we were able to visualise the effect of PST infection on the expression of various defence components and host immune receptors. Our data showed sequential, temporally coordinated activation and suppression of expression of a suite of immune-response regulators that varied between compatible and incompatible interactions. These findings provide the framework for a better understanding of how PST causes disease and support the idea that PST can suppress the expression of defence components in wheat to successfully colonize a susceptible host.
Evaluation of commercially available small RNASeq library preparation kits using low input RNA.
Yeri, Ashish; Courtright, Amanda; Danielson, Kirsty; Hutchins, Elizabeth; Alsop, Eric; Carlson, Elizabeth; Hsieh, Michael; Ziegler, Olivia; Das, Avash; Shah, Ravi V; Rozowsky, Joel; Das, Saumya; Van Keuren-Jensen, Kendall
2018-05-05
Evolving interest in comprehensively profiling the full range of small RNAs present in small tissue biopsies and in circulating biofluids, and how the profile differs with disease, has launched small RNA sequencing (RNASeq) into more frequent use. However, known biases associated with small RNASeq, compounded by low RNA inputs, have been both a significant concern and a hurdle to widespread adoption. As RNASeq is becoming a viable choice for the discovery of small RNAs in low input samples and more labs are employing it, there should be benchmark datasets to test and evaluate the performance of new sequencing protocols and operators. In a recent publication from the National Institute of Standards and Technology, Pine et al., 2018, the investigators used a commercially available set of three tissues and tested performance across labs and platforms. In this paper, we further tested the performance of low RNA input in three commonly used and commercially available RNASeq library preparation kits; NEB Next, NEXTFlex, and TruSeq small RNA library preparation. We evaluated the performance of the kits at two different sites, using three different tissues (brain, liver, and placenta) with high (1 μg) and low RNA (10 ng) input from tissue samples, or 5.0, 3.0, 2.0, 1.0, 0.5, and 0.2 ml starting volumes of plasma. As there has been a lack of robust validation platforms for differentially expressed miRNAs, we also compared low input RNASeq data with their expression profiles on three different platforms (Abcam Fireplex, HTG EdgeSeq, and Qiagen miRNome). The concordance of RNASeq results on these three platforms was dependent on the RNA expression level; the higher the expression, the better the reproducibility. The results provide an extensive analysis of small RNASeq kit performance using low RNA input, and replication of these data on three downstream technologies.
Chakraborty, Sandeep; Britton, Monica; Martínez-García, P J; Dandekar, Abhaya M
2016-03-01
Deep RNA-Seq profiling, a revolutionary method used for quantifying transcriptional levels, often includes non-specific transcripts from other co-existing organisms in spite of stringent protocols. Using the recently published walnut genome sequence as a filter, we present a broad analysis of the RNA-Seq derived transcriptome profiles obtained from twenty different tissues to extract the biodiversity and possible plant-microbe interactions in the walnut ecosystem in California. Since the residual nature of the transcripts being analyzed does not provide sufficient information to identify the exact strain, inferences made are constrained to the genus level. The presence of the pathogenic oomycete Phytophthora was detected in the root through the presence of a glyceraldehyde-3-phosphate dehydrogenase. Cryptococcus, the causal agent of cryptococcosis, was found in the catkins and vegetative buds, corroborating previous work indicating that the plant surface supported the sexual cycle of this human pathogen. The RNA-Seq profile revealed several species of the endophytic nitrogen fixing Actinobacteria. Another bacterial species implicated in aerobic biodegradation of methyl tert-butyl ether (Methylibium petroleiphilum) is also found in the root. RNA encoding proteins from the pea aphid were found in the leaves and vegetative buds, while a serine protease from mosquito with significant homology to a female reproductive tract protease from Drosophila mojavensis in the vegetative bud suggests egg-laying activities. The comprehensive analysis of RNA-seq data present also unraveled detailed, tissue-specific information of ~400 transcripts encoded by the largest family of resistance (R) genes (NBS-LRR), which possibly rationalizes the resistance of the specific walnut plant to the pathogens detected. Thus, we elucidate the biodiversity and possible plant-microbe interactions in several walnut (Juglans regia) tissues in California using deep RNA-Seq profiling.
Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai
2015-10-16
To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.
Chen, Kun; Tsutsumi, Yuki; Yoshitake, Shuhei; Qiu, Xuchun; Xu, Hai; Hashiguchi, Yasuyuki; Honda, Masato; Tashiro, Kosuke; Nakayama, Kei; Hano, Takeshi; Suzuki, Nobuo; Hayakawa, Kazuichi; Shimasaki, Yohei; Oshima, Yuji
2017-01-01
Benzo[c]phenanthrene (BcP) is a highly toxic polycyclic aromatic hydrocarbon (PAHs) found throughout the environment. In fish, it is metabolized to 3-hydroxybenzo[c]phenanthrene (3-OHBcP). In the present study, we observed the effects of 1nM 3-OHBcP on the development and gene expression of Japanese medaka (Oryzias latipes) embryos. Embryos were nanoinjected with the chemical after fertilization. Survival, developmental stage, and heart rate of the embryos were observed, and gene expression differences were quantified by messenger RNA sequencing (mRNA-Seq). The exposure to 1nM 3-OHBcP accelerated the development of medaka embryos on the 1st, 4th, and 6th days post fertilization (dpf), and increased heart rates significantly on the 5th dpf. Physical development differences of exposed medaka embryos were consistent with the gene expression profiles of the mRNA-Seq results for the 3rd dpf, which show that the expression of 780 genes differed significantly between the solvent control and 1nM 3-OHBcP exposure groups. The obvious expression changes in the exposure group were found for genes involved in organ formation (eye, muscle, heart), energy supply (ATPase and ATP synthase), and stress-response (heat shock protein genes). The acceleration of development and increased heart rate, which were consistent with the changes in mRNA expression, suggested that 3-OHBcP affects the development of medaka embryos. The observation on the developmental stages and heart beat, in ovo-nanoinjection and mRNA-Seq may be efficient tools to evaluate the effects of chemicals on embryos. Copyright © 2016 Elsevier B.V. All rights reserved.
De Moro, Gianluca; Gerdol, Marco; Guarnaccia, Corrado; Mosco, Alessandro; Pallavicini, Alberto; Giulianini, Piero Giulio
2013-01-01
The crustacean Hyperglycemic Hormone (cHH) is a neuropeptide present in many decapods. Two different chiral isomers are simultaneously present in Astacid crayfish and their specific biological functions are still poorly understood. The present study is aimed at better understanding the potentially different effect of each of the isomers on the hepatopancreatic gene expression profile in the crayfish Pontastacus leptodactylus, in the context of short term hyperglycemia. Hence, two different chemically synthesized cHH enantiomers, containing either L- or D-Phe3, were injected to the circulation of intermolt females following removal of their X organ-Sinus gland complex. The effects triggered by the injection of the two alternate isomers were detected after one hour through measurement of circulating glucose levels. Triggered changes of the transcriptome expression profile in the hepatopancreas were analyzed by RNA-seq. A whole transcriptome shotgun sequence assembly provided the assumedly complete transcriptome of P. leptodactylus hepatopancreas, followed by RNA-seq analysis of changes in the expression level of many genes caused by the application of each of the hormone isomers. Circulating glucose levels were much higher in response to the D-isoform than to the L-isoform injection, one hour from injection. Similarly, the RNA-seq analysis confirmed a stronger effect on gene expression following the administration of D-cHH, while just limited alterations were caused by the L-isomer. These findings demonstrated a more prominent short term effect of the D-cHH on the transcription profile and shed light on the effect of the D-isomer on specific functional gene groups. Another contribution of the study is the construction of a de novo assembly of the hepatopancreas transcriptome, consisting of 39,935 contigs, that dramatically increases the molecular information available for this species and for crustaceans in general, providing an efficient tool for studying gene expression patterns in this organ. PMID:23840318
Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.
Van den Berge, Koen; Perraudeau, Fanny; Soneson, Charlotte; Love, Michael I; Risso, Davide; Vert, Jean-Philippe; Robinson, Mark D; Dudoit, Sandrine; Clement, Lieven
2018-02-26
Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.
Hui, Raymond K; Leung, Frederick C
2015-01-01
RNA-Seq was used to unveil the transcriptional profile of DF-1 cells at the early stage of caIBDV infection. Total RNAs were extracted from virus-infected cells at 0, 6 and 12 hpi. RNA-Seq datasets of respective samples mapped to 56.5-57.6% of isoforms in the reference genome Galgal4.73. At 6 hpi, 23 isoforms underwent an elevated expression, while 128 isoforms were up-regulated and 5 were down-regulated at 12 hpi in the virus-infected group. Besides, 10 isoforms were exclusively expressed in the virus-infected cells. Though no significant change was detected in cytokine and interferon expression levels at the first 12 hours of infection, modulations of the upstream regulators were observed. In addition to the reported regulatory factors including EIF2AK2, MX, OAS*A, GBP7 and IFIT, IBDV infection also triggered a IFIT5-IRF1/3-RSAD5 pathway in the DF-1 cells which potentially restricted the viral replication cycle in the early infection stage. Over-expression of LIPA and CH25H, together with the suppression of STARD4, LSS and AACS genes implied a modulation of membrane fluidity and lipid raft arrangement in the infected cells. Alternative splicing of the EFR3 homolog A gene was also through to be involved in the lipid membrane regulation, and these cumulative responses projected an inhibition of viral endocytosis. Recognition of viral RNA genomes and intermediates was presumably enhanced by the elevated levels of IFIH1, DHX58 and TRIM25 genes which possess properties on detecting viral dsRNA. On the other hand, the caIBDV arrested the host's apoptotic process by inducing the expression of apoptosis inhibitors including NFKBIA/Z, TNFAIP2/3 and ITA at the first 12 hours of infection. In conclusion, the differential expression landscape demonstrated with RNA-Seq provides a comprehensive picture on the molecular interactions between host cells and virus at the early stage of infection.
Spatial organization of silybin biosynthesis in milk thistle [Silybum marianum (L.) Gaertn].
Lv, Yongkun; Gao, Song; Xu, Sha; Du, Guocheng; Zhou, Jingwen; Chen, Jian
2017-12-01
Silymarin is a collection of compounds extracted from the medicinal herb milk thistle, among which silybin is the major flavonolignan. However, the biosynthesis pathway of silybin remains unclear. In this study, biomimetic reactions demonstrated that silybin can be synthesized from coniferyl alcohol and taxifolin by the action of peroxidase. The concentration profiles of silybin and its precursors and RNA-Seq analysis of gene expression revealed that the amount of taxifolin and the activity of peroxidase serve as the limiting factors in silybin biosynthesis. Hierarchical clustering of the expression profile of genes of the flavonoid biosynthesis pathway distinguished flowers from other organs. RNA-Seq revealed five candidates for the peroxidase involved in silybin production, among which APX1 (ascorbate peroxidase 1) showed a distinct peroxidase activity and the capacity to synthesize silybin. The spatial organization of silybin biosynthesis in milk thistle was elucidated, which could help our understanding of the biosynthesis of silybin and other flavonolignans. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Technical variations in low-input RNA-seq methodologies.
Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar
2014-01-14
Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.
Budak, Gungor; Srivastava, Rajneesh; Janga, Sarath Chandra
2017-06-01
RNA-binding proteins (RBPs) control the regulation of gene expression in eukaryotic genomes at post-transcriptional level by binding to their cognate RNAs. Although several variants of CLIP (crosslinking and immunoprecipitation) protocols are currently available to study the global protein-RNA interaction landscape at single-nucleotide resolution in a cell, currently there are very few tools that can facilitate understanding and dissecting the functional associations of RBPs from the resulting binding maps. Here, we present Seten, a web-based and command line tool, which can identify and compare processes, phenotypes, and diseases associated with RBPs from condition-specific CLIP-seq profiles. Seten uses BED files resulting from most peak calling algorithms, which include scores reflecting the extent of binding of an RBP on the target transcript, to provide both traditional functional enrichment as well as gene set enrichment results for a number of gene set collections including BioCarta, KEGG, Reactome, Gene Ontology (GO), Human Phenotype Ontology (HPO), and MalaCards Disease Ontology for several organisms including fruit fly, human, mouse, rat, worm, and yeast. It also provides an option to dynamically compare the associated gene sets across data sets as bubble charts, to facilitate comparative analysis. Benchmarking of Seten using eCLIP data for IGF2BP1, SRSF7, and PTBP1 against their corresponding CRISPR RNA-seq in K562 cells as well as randomized negative controls, demonstrated that its gene set enrichment method outperforms functional enrichment, with scores significantly contributing to the discovery of true annotations. Comparative performance analysis using these CRISPR control data sets revealed significantly higher precision and comparable recall to that observed using ChIP-Enrich. Seten's web interface currently provides precomputed results for about 200 CLIP-seq data sets and both command line as well as web interfaces can be used to analyze CLIP-seq data sets. We highlight several examples to show the utility of Seten for rapid profiling of various CLIP-seq data sets. Seten is available on http://www.iupui.edu/∼sysbio/seten/. © 2017 Budak et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.
Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X
2017-12-05
Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.
De Coi, Niccolò; Feuermann, Marc; Schmid-Siegert, Emanuel; Băguţ, Elena-Tatiana; Mignon, Bernard; Waridel, Patrice; Peter, Corinne; Pradervand, Sylvain
2016-01-01
ABSTRACT Dermatophytes are the most common agents of superficial mycoses in humans and animals. The aim of the present investigation was to systematically identify the extracellular, possibly secreted, proteins that are putative virulence factors and antigenic molecules of dermatophytes. A complete gene expression profile of Arthroderma benhamiae was obtained during infection of its natural host (guinea pig) using RNA sequencing (RNA-seq) technology. This profile was completed with those of the fungus cultivated in vitro in two media containing either keratin or soy meal protein as the sole source of nitrogen and in Sabouraud medium. More than 60% of transcripts deduced from RNA-seq data differ from those previously deposited for A. benhamiae. Using these RNA-seq data along with an automatic gene annotation procedure, followed by manual curation, we produced a new annotation of the A. benhamiae genome. This annotation comprised 7,405 coding sequences (CDSs), among which only 2,662 were identical to the currently available annotation, 383 were newly identified, and 15 secreted proteins were manually corrected. The expression profile of genes encoding proteins with a signal peptide in infected guinea pigs was found to be very different from that during in vitro growth when using keratin as the substrate. Especially, the sets of the 12 most highly expressed genes encoding proteases with a signal sequence had only the putative vacuolar aspartic protease gene PEP2 in common, during infection and in keratin medium. The most upregulated gene encoding a secreted protease during infection was that encoding subtilisin SUB6, which is a known major allergen in the related dermatophyte Trichophyton rubrum. IMPORTANCE Dermatophytoses (ringworm, jock itch, athlete’s foot, and nail infections) are the most common fungal infections, but their virulence mechanisms are poorly understood. Combining transcriptomic data obtained from growth under various culture conditions with data obtained during infection led to a significantly improved genome annotation. About 65% of the protein-encoding genes predicted with our protocol did not match the existing annotation for A. benhamiae. Comparing gene expression during infection on guinea pigs with keratin degradation in vitro, which is supposed to mimic the host environment, revealed the critical importance of using real in vivo conditions for investigating virulence mechanisms. The analysis of genes expressed in vivo, encoding cell surface and secreted proteins, particularly proteases, led to the identification of new allergen and virulence factor candidates. PMID:27822542
Tran, Van Du T; De Coi, Niccolò; Feuermann, Marc; Schmid-Siegert, Emanuel; Băguţ, Elena-Tatiana; Mignon, Bernard; Waridel, Patrice; Peter, Corinne; Pradervand, Sylvain; Pagni, Marco; Monod, Michel
2016-01-01
Dermatophytes are the most common agents of superficial mycoses in humans and animals. The aim of the present investigation was to systematically identify the extracellular, possibly secreted, proteins that are putative virulence factors and antigenic molecules of dermatophytes. A complete gene expression profile of Arthroderma benhamiae was obtained during infection of its natural host (guinea pig) using RNA sequencing (RNA-seq) technology. This profile was completed with those of the fungus cultivated in vitro in two media containing either keratin or soy meal protein as the sole source of nitrogen and in Sabouraud medium. More than 60% of transcripts deduced from RNA-seq data differ from those previously deposited for A. benhamiae . Using these RNA-seq data along with an automatic gene annotation procedure, followed by manual curation, we produced a new annotation of the A. benhamiae genome. This annotation comprised 7,405 coding sequences (CDSs), among which only 2,662 were identical to the currently available annotation, 383 were newly identified, and 15 secreted proteins were manually corrected. The expression profile of genes encoding proteins with a signal peptide in infected guinea pigs was found to be very different from that during in vitro growth when using keratin as the substrate. Especially, the sets of the 12 most highly expressed genes encoding proteases with a signal sequence had only the putative vacuolar aspartic protease gene PEP2 in common, during infection and in keratin medium. The most upregulated gene encoding a secreted protease during infection was that encoding subtilisin SUB6, which is a known major allergen in the related dermatophyte Trichophyton rubrum . IMPORTANCE Dermatophytoses (ringworm, jock itch, athlete's foot, and nail infections) are the most common fungal infections, but their virulence mechanisms are poorly understood. Combining transcriptomic data obtained from growth under various culture conditions with data obtained during infection led to a significantly improved genome annotation. About 65% of the protein-encoding genes predicted with our protocol did not match the existing annotation for A. benhamiae . Comparing gene expression during infection on guinea pigs with keratin degradation in vitro , which is supposed to mimic the host environment, revealed the critical importance of using real in vivo conditions for investigating virulence mechanisms. The analysis of genes expressed in vivo , encoding cell surface and secreted proteins, particularly proteases, led to the identification of new allergen and virulence factor candidates.
Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior
2012-01-01
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036
Radiation-induced alternative transcripts as detected in total and polysome-bound mRNA.
Wahba, Amy; Ryan, Michael C; Shankavaram, Uma T; Camphausen, Kevin; Tofilon, Philip J
2018-01-02
Alternative splicing is a critical event in the posttranscriptional regulation of gene expression. To investigate whether this process influences radiation-induced gene expression we defined the effects of ionizing radiation on the generation of alternative transcripts in total cellular mRNA (the transcriptome) and polysome-bound mRNA (the translatome) of the human glioblastoma stem-like cell line NSC11. For these studies, RNA-Seq profiles from control and irradiated cells were compared using the program SpliceSeq to identify transcripts and splice variations induced by radiation. As compared to the transcriptome (total RNA) of untreated cells, the radiation-induced transcriptome contained 92 splice events suggesting that radiation induced alternative splicing. As compared to the translatome (polysome-bound RNA) of untreated cells, the radiation-induced translatome contained 280 splice events of which only 24 were overlapping with the radiation-induced transcriptome. These results suggest that radiation not only modifies alternative splicing of precursor mRNA, but also results in the selective association of existing mRNA isoforms with polysomes. Comparison of radiation-induced alternative transcripts to radiation-induced gene expression in total RNA revealed little overlap (about 3%). In contrast, in the radiation-induced translatome, about 38% of the induced alternative transcripts corresponded to genes whose expression level was affected in the translatome. This study suggests that whereas radiation induces alternate splicing, the alternative transcripts present at the time of irradiation may play a role in the radiation-induced translational control of gene expression and thus cellular radioresponse.
Yu, Ying; Zhao, Chen; Su, Zhenqiang; Wang, Charles; Fuscoe, James C; Tong, Weida; Shi, Leming
2014-01-01
The rat is used extensively by the pharmaceutical, regulatory, and academic communities for safety assessment of drugs and chemicals and for studying human diseases; however, its transcriptome has not been well studied. As part of the SEQC (i.e., MAQC-III) consortium efforts, a comprehensive RNA-Seq data set was constructed using 320 RNA samples isolated from 10 organs (adrenal gland, brain, heart, kidney, liver, lung, muscle, spleen, thymus, and testes or uterus) from both sexes of Fischer 344 rats across four ages (2-, 6-, 21-, and 104-week-old) with four biological replicates for each of the 80 sample groups (organ-sex-age). With the Ribo-Zero rRNA removal and Illumina RNA-Seq protocols, 41 million 50 bp single-end reads were generated per sample, yielding a total of 13.4 billion reads. This data set could be used to identify and validate new rat genes and transcripts, develop a more comprehensive rat transcriptome annotation system, identify novel gene regulatory networks related to tissue specific gene expression and development, and discover genes responsible for disease and drug toxicity and efficacy.
Dose-Response Analysis of RNA-Seq Profiles in Archival ...
Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses using RNA-sequencing in paired FFPE and frozen (FROZ) samples from two archival studies in mice, one 20 years old. Experimental treatments included 3 different doses of di(2-ethylhexyl)phthalate or dichloroacetic acid for the recently archived and older studies, respectively. Total RNA was ribo-depleted and sequenced using the Illumina HiSeq platform. In the recently archived study, FFPE samples had 35% lower total counts compared to FROZ samples but high concordance in fold-change values of differentially expressed genes (DEGs) (r2 = 0.99), highly enriched pathways (90% overlap with FROZ), and benchmark dose estimates for preselected target genes (2% difference vs FROZ). In contrast, older FFPE samples had markedly lower total counts (3% of FROZ) and poor concordance in global DEGs and pathways. However, counts from FFPE and FROZ samples still positively correlated (r2 = 0.84 across all transcripts) and showed comparable dose responses for more highly expressed target genes. These findings highlight potential applications and issues in using RNA-sequencing data from FFPE samples. Recently archived FFPE samples were highly similar to FROZ samples in sequencing q
Comparison of alternative approaches for analysing multi-level RNA-seq data
Mohorianu, Irina; Bretman, Amanda; Smith, Damian T.; Fowler, Emily K.; Dalmay, Tamas
2017-01-01
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. PMID:28792517
Wang, Zhuo; Jin, Shuilin; Liu, Guiyou; Zhang, Xiurui; Wang, Nan; Wu, Deliang; Hu, Yang; Zhang, Chiping; Jiang, Qinghua; Xu, Li; Wang, Yadong
2017-05-23
The development of single-cell RNA sequencing has enabled profound discoveries in biology, ranging from the dissection of the composition of complex tissues to the identification of novel cell types and dynamics in some specialized cellular environments. However, the large-scale generation of single-cell RNA-seq (scRNA-seq) data collected at multiple time points remains a challenge to effective measurement gene expression patterns in transcriptome analysis. We present an algorithm based on the Dynamic Time Warping score (DTWscore) combined with time-series data, that enables the detection of gene expression changes across scRNA-seq samples and recovery of potential cell types from complex mixtures of multiple cell types. The DTWscore successfully classify cells of different types with the most highly variable genes from time-series scRNA-seq data. The study was confined to methods that are implemented and available within the R framework. Sample datasets and R packages are available at https://github.com/xiaoxiaoxier/DTWscore .
Transcript profiling reveals expression differences in wild-type and glabrous soybean lines
2011-01-01
Background Trichome hairs affect diverse agronomic characters such as seed weight and yield, prevent insect damage and reduce loss of water but their molecular control has not been extensively studied in soybean. Several detailed models for trichome development have been proposed for Arabidopsis thaliana, but their applicability to important crops such as cotton and soybean is not fully known. Results Two high throughput transcript sequencing methods, Digital Gene Expression (DGE) Tag Profiling and RNA-Seq, were used to compare the transcriptional profiles in wild-type (cv. Clark standard, CS) and a mutant (cv. Clark glabrous, i.e., trichomeless or hairless, CG) soybean isoline that carries the dominant P1 allele. DGE data and RNA-Seq data were mapped to the cDNAs (Glyma models) predicted from the reference soybean genome, Williams 82. Extending the model length by 250 bp at both ends resulted in significantly more matches of authentic DGE tags indicating that many of the predicted gene models are prematurely truncated at the 5' and 3' UTRs. The genome-wide comparative study of the transcript profiles of the wild-type versus mutant line revealed a number of differentially expressed genes. One highly-expressed gene, Glyma04g35130, in wild-type soybean was of interest as it has high homology to the cotton gene GhRDL1 gene that has been identified as being involved in cotton fiber initiation and is a member of the BURP protein family. Sequence comparison of Glyma04g35130 among Williams 82 with our sequences derived from CS and CG isolines revealed various SNPs and indels including addition of one nucleotide C in the CG and insertion of ~60 bp in the third exon of CS that causes a frameshift mutation and premature truncation of peptides in both lines as compared to Williams 82. Conclusion Although not a candidate for the P1 locus, a BURP family member (Glyma04g35130) from soybean has been shown to be abundantly expressed in the CS line and very weakly expressed in the glabrous CG line. RNA-Seq and DGE data are compared and provide experimental data on the expression of predicted soybean gene models as well as an overview of the genes expressed in young shoot tips of two closely related isolines. PMID:22029708
Cabeza, Ricardo A.; Liese, Rebecca; Lingner, Annika; von Stieglitz, Ilsabe; Neumann, Janice; Salinas-Riester, Gabriela; Pommerenke, Claudia; Dittert, Klaus; Schulze, Joachim
2014-01-01
Legume nodules are plant tissues with an exceptionally high concentration of phosphorus (P), which, when there is scarcity of P, is preferentially maintained there rather than being allocated to other plant organs. The hypothesis of this study was that nodules are affected before the P concentration in the organ declines during whole-plant P depletion. Nitrogen (N2) fixation and P concentration in various organs were monitored during a whole-plant P-depletion process in Medicago truncatula. Nodule gene expression was profiled through RNA-seq at day 5 of P depletion. Until that point in time P concentration in leaves reached a lower threshold but was maintained in nodules. N2-fixation activity per plant diverged from that of fully nourished plants beginning at day 5 of the P-depletion process, primarily because fewer nodules were being formed, while the activity of the existing nodules was maintained for as long as two weeks into P depletion. RNA-seq revealed nodule acclimation on a molecular level with a total of 1140 differentially expressed genes. Numerous genes for P remobilization from organic structures were increasingly expressed. Various genes involved in nodule malate formation were upregulated, while genes involved in fermentation were downregulated. The fact that nodule formation was strongly repressed with the onset of P deficiency is reflected in the differential expression of various genes involved in nodulation. It is concluded that plants follow a strategy to maintain N2 fixation and viable leaf tissue as long as possible during whole-plant P depletion to maintain their ability to react to emerging new P sources (e.g. through active P acquisition by roots). PMID:25151618
expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform1[OPEN
2016-01-01
The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP’s suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. PMID:26869702
Cross-platform single cell analysis of kidney development shows stromal cells express Gdnf.
Magella, Bliss; Adam, Mike; Potter, Andrew S; Venkatasubramanian, Meenakshi; Chetal, Kashish; Hay, Stuart B; Salomonis, Nathan; Potter, S Steven
2018-02-01
The developing kidney provides a useful model for study of the principles of organogenesis. In this report we use three independent platforms, Drop-Seq, Chromium 10x Genomics and Fluidigm C1, to carry out single cell RNA-Seq (scRNA-Seq) analysis of the E14.5 mouse kidney. Using the software AltAnalyze, in conjunction with the unsupervised approach ICGS, we were unable to identify and confirm the presence of 16 distinct cell populations during this stage of active nephrogenesis. Using a novel integrative supervised computational strategy, we were able to successfully harmonize and compare the cell profiles across all three technological platforms. Analysis of possible cross compartment receptor/ligand interactions identified the nephrogenic zone stroma as a source of GDNF. This was unexpected because the cap mesenchyme nephron progenitors had been thought to be the sole source of GDNF, which is a key driver of branching morphogenesis of the collecting duct system. The expression of Gdnf by stromal cells was validated in several ways, including Gdnf in situ hybridization combined with immunohistochemistry for SIX2, and marker of nephron progenitors, and MEIS1, a marker of stromal cells. Finally, the single cell gene expression profiles generated in this study confirmed and extended previous work showing the presence of multilineage priming during kidney development. Nephron progenitors showed stochastic expression of genes associated with multiple potential differentiation lineages. Copyright © 2017 Elsevier Inc. All rights reserved.
Expression profile of circular RNAs in infantile hemangioma detected by RNA-Seq.
Li, Jun; Li, Qian; Chen, Ling; Gao, Yanli; Li, Jingyun
2018-05-01
Circular RNAs (circRNAs) have emerged as a novel class of widespread non-coding RNAs, and they play crucial roles in various biological processes. However, the characterization and function of circRNAs in infantile hemangioma (IH) remain elusive. In this study, we used RNA-Seq and circRNA prediction to study and characterize the circRNAs in IH tissue and a matched normal skin control. Specific circRNAs were verified using real-time polymerase chain reaction. We found that of the 9811 identified circRNAs, 249 candidates were differentially expressed, including 124 upregulated and 125 downregulated circRNAs in the IH group compared with the matched normal skin control group. A set of differentially expressed circRNAs (in particular, hsa_circRNA001885 and hsa_circRNA006612 expression) were confirmed using qRT-PCR. Gene ontology and pathway analysis revealed that compared to matched normal skin tissues, many processes that were over-represented in IH group were related to the binding, protein binding, gap junction, and focal adhesion. Specific circRNAs were associated with several micro-RNAs (miRNAs) predicted using miRanda. Altogether, our findings highlight the potential importance of circRNAs in the biology of IH and its response to treatment.
Liu, Tengfei; Yang, Ping; Chen, Hong; Huang, Yufei; Liu, Yi; Waqas, Yasir; Ahmed, Nisar; Chu, Xiaoya; Chen, Qiusheng
2016-01-01
Important evolutionary and ecological consequences arise from the ability of female turtles to store viable spermatozoa for an extended period. Although previous morphological studies have observed the localization of spermatozoa in Pelodiscus sinensis oviduct, no systematic study on the identification of genes that are involved in long-term sperm storage has been performed. In this study, the oviduct of P. sinensis at different phases (reproductive and hibernation seasons) was prepared for RNA-Seq and gene expression profiling. In total, 2,662 differentially expressed genes (DEGs) including 1,224 up- and 1,438 down-regulated genes were identified from two cDNA libraries. Functional enrichment analysis indicated that many genes were predominantly involved in the immune response, apoptosis pathway and regulation of autophagy. RT-qPCR, ELISA, western blot and IHC analyses showed that the expression profiles of mRNA and protein in selected DEGs were in consistent with results from RNA-Seq analysis. Remarkably, TUNEL analysis revealed the reduced number of apoptotic cells during sperm storage. IHC and TEM analyses found that autophagy occurred in the oviduct epithelial cells, where the spermatozoa were closely attached. The outcomes of this study provide fundamental insights into the complex sperm storage regulatory process and facilitate elucidating the mechanism of sperm storage in P. sinensis. PMID:27628424
Xu, Hai-Ming; Kong, Xiang-Dong; Chen, Fei; Huang, Ji-Xiang; Lou, Xiang-Yang; Zhao, Jian-Yi
2015-10-24
Brassica napus is an important oilseed crop. Dissection of the genetic architecture underlying oil-related biological processes will greatly facilitates the genetic improvement of rapeseed. The differential gene expression during pod development offers a snapshot on the genes responsible for oil accumulation in. To identify candidate genes in the linkage peaks reported previously, we used RNA sequencing (RNA-Seq) technology to analyze the pod transcriptomes of German cultivar Sollux and Chinese inbred line Gaoyou. The RNA samples were collected for RNA-Seq at 5-7, 15-17 and 25-27 days after flowering (DAF). Bioinformatics analysis was performed to investigate differentially expressed genes (DEGs). Gene annotation analysis was integrated with QTL mapping and Brassica napus pod transcriptome profiling to detect potential candidate genes in oilseed. Four hundred sixty five and two thousand, one hundred fourteen candidate DEGs were identified, respectively, between two varieties at the same stages and across different periods of each variety. Then, 33 DEGs between Sollux and Gaoyou were identified as the candidate genes affecting seed oil content by combining those DEGs with the quantitative trait locus (QTL) mapping results, of which, one was found to be homologous to Arabidopsis thaliana lipid-related genes. Intervarietal DEGs of lipid pathways in QTL regions represent important candidate genes for oil-related traits. Integrated analysis of transcriptome profiling, QTL mapping and comparative genomics with other relative species leads to efficient identification of most plausible functional genes underlying oil-content related characters, offering valuable resources for bettering breeding program of Brassica napus. This study provided a comprehensive overview on the pod transcriptomes of two varieties with different oil-contents at the three developmental stages.
RNA-Seq for Bacterial Gene Expression.
Poulsen, Line Dahl; Vinther, Jeppe
2018-06-01
RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.
Recent advances in targeted RNA-Seq technology allow researchers to efficiently and cost-effectively obtain whole transcriptome profiles using picograms of mRNA from human cell lysates. Low mRNA input requirements and sample multiplexing capabilities has made time- and concentrat...
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.
2015-01-01
Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053
Effects of MicroRNA-23a on Differentiation and Gene Expression Profiles in 3T3-L1 Adipocytes
Huang, Yong; Huang, Jinxiu; Qi, Renli; Wang, Qi; Wu, Yongjiang; Wang, Jing
2016-01-01
MicroRNAs (miRNAs) are small non-coding RNA molecules that regulate growth, development, and programmed death of cells. A newly-published study has shown that miRNA-23a could regulate 3T3-L1 adipocyte differentiation. Here, we identified miRNA-23a as a negative regulator of 3T3-L1 adipocyte differentiation again. Over-expression of miRNA-23a inhibited differentiation and decreased lipogenesis as well as down-regulated mRNA and protein expression of both peroxisome proliferator-activated receptor (PPAR) γ and fatty acid binding protein (FABP) 4, whereas knock down of miRNA-23a showed the opposite effects on differentiation as well as increasing the number of apoptotic cells. Additionally, digital gene expression profiling sequencing (DGE-Seq) was used to assay changes in gene expression profiles following alterations in the level of miR-23a. In total, over-expression or knock down of miRNA-23a significantly changed the expression of 313 and 425 genes, respectively. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses indicated that these genes were mainly involved in the stress response, immune system, metabolism, cell cycle, among other pathways. Additionally, the signal transducer and activator of transcription 1 (Stat1) was shown to be a target of miRNA-23a by computational and dual-luciferase reporter assays that indicated Janus Kinase (Jak)-Stat signal pathway was implicated in regulating adipogenesis mediated by miRNA-23a in adipocytes. PMID:27783036
USDA-ARS?s Scientific Manuscript database
The role of microRNA expression and genetic variation in microRNA-binding sites of target genes on growth and muscle quality traits is poorly characterized. We used RNA-Seq approach to investigate their importance on 5 growth and muscle quality traits: whole body weight (WBW), muscle yield, muscle c...
RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.
Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K
2014-10-01
RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
Logic programming to infer complex RNA expression patterns from RNA-seq data.
Weirick, Tyler; Militello, Giuseppe; Ponomareva, Yuliya; John, David; Döring, Claudia; Dimmeler, Stefanie; Uchida, Shizuka
2018-03-01
To meet the increasing demand in the field, numerous long noncoding RNA (lncRNA) databases are available. Given many lncRNAs are specifically expressed in certain cell types and/or time-dependent manners, most lncRNA databases fall short of providing such profiles. We developed a strategy using logic programming to handle the complex organization of organs, their tissues and cell types as well as gender and developmental time points. To showcase this strategy, we introduce 'RenalDB' (http://renaldb.uni-frankfurt.de), a database providing expression profiles of RNAs in major organs focusing on kidney tissues and cells. RenalDB uses logic programming to describe complex anatomy, sample metadata and logical relationships defining expression, enrichment or specificity. We validated the content of RenalDB with biological experiments and functionally characterized two long intergenic noncoding RNAs: LOC440173 is important for cell growth or cell survival, whereas PAXIP1-AS1 is a regulator of cell death. We anticipate RenalDB will be used as a first step toward functional studies of lncRNAs in the kidney.
RNA-Seq workflow: gene-level exploratory analysis and differential expression
Love, Michael I.; Anders, Simon; Kim, Vladislav; Huber, Wolfgang
2015-01-01
Here we walk through an end-to-end gene-level RNA-Seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results. PMID:26674615
Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol; Scalcinati, Gionata; Fagerberg, Linn; Uhlén, Matthias; Nielsen, Jens
2012-01-01
RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. PMID:22965124
The Landscape of MicroRNA, Piwi-Interacting RNA, and Circular RNA in Human Saliva
Bahn, Jae Hoon; Zhang, Qing; Li, Feng; Chan, Tak-Ming; Lin, Xianzhi; Kim, Yong; Wong, David T.W.; Xiao, Xinshu
2015-01-01
BACKGROUND Extracellular RNAs (exRNAs) in human body fluids are emerging as effective biomarkers for detection of diseases. Saliva, as the most accessible and noninvasive body fluid, has been shown to harbor exRNA biomarkers for several human diseases. However, the entire spectrum of exRNA from saliva has not been fully characterized. METHODS Using high-throughput RNA sequencing (RNA-Seq), we conducted an in-depth bioinformatic analysis of noncoding RNAs (ncRNAs) in human cell-free saliva (CFS) from healthy individuals, with a focus on microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), and circular RNAs (circRNAs). RESULTS Our data demonstrated robust reproducibility of miRNA and piRNA profiles across individuals. Furthermore, individual variability of these salivary RNA species was highly similar to those in other body fluids or cellular samples, despite the direct exposure of saliva to environmental impacts. By comparative analysis of >90 RNA-Seq data sets of different origins, we observed that piRNAs were surprisingly abundant in CFS compared with other body fluid or intracellular samples, with expression levels in CFS comparable to those found in embryonic stem cells and skin cells. Conversely, miRNA expression profiles in CFS were highly similar to those in serum and cerebrospinal fluid. Using a customized bioinformatics method, we identified >400 circRNAs in CFS. These data represent the first global characterization and experimental validation of circRNAs in any type of extracellular body fluid. CONCLUSIONS Our study provides a comprehensive landscape of ncRNA species in human saliva that will facilitate further biomarker discoveries and lay a foundation for future studies related to ncRNAs in human saliva. PMID:25376581
YM500: a small RNA sequencing (smRNA-seq) database for microRNA research
Cheng, Wei-Chung; Chung, I-Fang; Huang, Tse-Shun; Chang, Shih-Ting; Sun, Hsing-Jen; Tsai, Cheng-Fong; Liang, Muh-Lii; Wong, Tai-Tong; Wang, Hsei-Wei
2013-01-01
MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications. PMID:23203880
Bouquet, Jerome; Gardy, Jennifer L; Brown, Scott; Pfeil, Jacob; Miller, Ruth R; Morshed, Muhammad; Avina-Zubieta, Antonio; Shojania, Kam; McCabe, Mark; Parker, Shoshana; Uyaguari, Miguel; Federman, Scot; Tang, Patrick; Steiner, Ted; Otterstater, Michael; Holt, Rob; Moore, Richard; Chiu, Charles Y; Patrick, David M
2017-02-15
Chronic fatigue syndrome (CFS) remains poorly understood. Although infections are speculated to trigger the syndrome, a specific infectious agent and underlying pathophysiological mechanism remain elusive. In a previous study, we described similar clinical phenotypes in CFS patients and alternatively diagnosed chronic Lyme syndrome (ADCLS) patients—individuals diagnosed with Lyme disease by testing from private Lyme specialty laboratories but who test negative by reference 2-tiered serologic analysis. Here, we performed blinded RNA-seq analysis of whole blood collected from 25 adults diagnosed with CFS and 13 ADCLS patients, comparing these cases to 25 matched controls and 11 patients with well-controlled systemic lupus erythematosus (SLE). Samples were collected at patient enrollment and not during acute symptom flares. RNA-seq data were used to study host gene expression, B-cell/T-cell receptor profiles (BCR/TCR), and potential viral infections. No differentially expressed genes (DEGs) were found to be significant when CFS or ADCLS cases were compared to controls. Forty-two DEGs were found when SLE cases were compared to controls, consistent with activation of interferon signaling pathways associated with SLE disease. BCR/TCR repertoire analysis did not show significant differences between CFS and controls or ADCLS and controls. Finally, viral sequences corresponding to anelloviruses, human pegivirus 1, herpesviruses, and papillomaviruses were detected in RNA-seq data, but proportions were similar (P = .73) across all genus-level taxonomic categories. Our observations do not support a theory of transcriptionally mediated immune cell dysregulation in CFS and ADCLS, at least outside of periods of acute symptom flares. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.
Langevin, Stanley A; Bent, Zachary W; Solberg, Owen D; Curtis, Deanna J; Lane, Pamela D; Williams, Kelly P; Schoeniger, Joseph S; Sinha, Anupama; Lane, Todd W; Branda, Steven S
2013-04-01
Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows.
RNA-Seq of Circulating Tumor Cells in Stage II-III Breast Cancer.
Lang, Julie E; Ring, Alexander; Porras, Tania; Kaur, Pushpinder; Forte, Victoria A; Mineyev, Neal; Tripathy, Debu; Press, Michael F; Campo, Daniel
2018-06-04
We characterized the whole transcriptome of circulating tumor cells (CTCs) in stage II-III breast cancer to evaluate correlations with primary tumor biology. CTCs were isolated from peripheral blood (PB) via immunomagnetic enrichment followed by fluorescence-activated cell sorting (IE/FACS). CTCs, PB, and fresh tumors were profiled using RNA-seq. Formalin-fixed, paraffin-embedded (FFPE) tumors were subjected to RNA-seq and NanoString PAM50 assays with risk of recurrence (ROR) scores. CTCs were detected in 29/33 (88%) patients. We selected 21 cases to attempt RNA-seq (median number of CTCs = 9). Sixteen CTC samples yielded results that passed quality-control metrics, and these samples had a median of 4,311,255 uniquely mapped reads (less than PB or tumors). Intrinsic subtype predicted by comparing estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) versus PAM50 for FFPE tumors was 85% concordant. However, CTC RNA-seq subtype assessed by the PAM50 classification genes was highly discordant, both with the subtype predicted by ER/PR/HER2 and by PAM50 tumors. Two patients died of metastatic disease, both of whom had high ROR scores and high CTC counts. We identified significant genes, canonical pathways, upstream regulators, and molecular interaction networks comparing CTCs by various clinical factors. We also identified a 75-gene signature with highest expression in CTCs and tumors taken together that was prognostic in The Cancer Genome Atlas and Molecular Taxonomy of Breast Cancer International Consortium datasets. It is feasible to use RNA-seq of CTCs in non-metastatic patients to discover novel tumor biology characteristics.
GC-Content Normalization for RNA-Seq Data
2011-01-01
Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264
RNA-Seq transcriptome profiling of mouse oocytes after in vitro maturation and/or vitrification.
Gao, Lei; Jia, Gongxue; Li, Ai; Ma, Haojia; Huang, Zhengyuan; Zhu, Shien; Hou, Yunpeng; Fu, Xiangwei
2017-10-16
In vitro maturation (IVM) and vitrification have been widely used to prepare oocytes before fertilization; however, potential effects of these procedures, such as expression profile changes, are poorly understood. In this study, mouse oocytes were divided into four groups and subjected to combinations of in vitro maturation and/or vitrification treatments. RNA-seq and in silico pathway analysis were used to identify differentially expressed genes (DEGs) that may be involved in oocyte viability after in vitro maturation and/or vitrification. Our results showed that 1) 69 genes were differentially expressed after IVM, 66 of which were up-regulated. Atp5e and Atp5o were enriched in the most significant gene ontology term "mitochondrial membrane part"; thus, these genes may be promising candidate biomarkers for oocyte viability after IVM. 2) The influence of vitrification on the transcriptome of oocytes was negligible, as no DEGs were found between vitrified and fresh oocytes. 3) The MII stage is more suitable for oocyte vitrification with respect to the transcriptome. This study provides a valuable new theoretical basis to further improve the efficiency of in vitro maturation and/or oocyte vitrification.
Protein Interaction Profile Sequencing (PIP-seq).
Foley, Shawn W; Gregory, Brian D
2016-10-10
Every eukaryotic RNA transcript undergoes extensive post-transcriptional processing from the moment of transcription up through degradation. This regulation is performed by a distinct cohort of RNA-binding proteins which recognize their target transcript by both its primary sequence and secondary structure. Here, we describe protein interaction profile sequencing (PIP-seq), a technique that uses ribonuclease-based footprinting followed by high-throughput sequencing to globally assess both protein-bound RNA sequences and RNA secondary structure. PIP-seq utilizes single- and double-stranded RNA-specific nucleases in the absence of proteins to infer RNA secondary structure. These libraries are also compared to samples that undergo nuclease digestion in the presence of proteins in order to find enriched protein-bound sequences. Combined, these four libraries provide a comprehensive, transcriptome-wide view of RNA secondary structure and RNA protein interaction sites from a single experimental technique. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
RNA-Seq Profiling Reveals Novel Hepatic Gene Expression Pattern in Aflatoxin B1 Treated Rats
Merrick, B. Alex; Phadke, Dhiral P.; Auerbach, Scott S.; Mav, Deepak; Stiegelmeyer, Suzy M.; Shah, Ruchir R.; Tice, Raymond R.
2013-01-01
Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1’s carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT’s) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq’s capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma. PMID:23630614
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export.
Karijolich, John; Zhao, Yang; Alla, Ravi; Glaunsinger, Britt
2017-06-02
Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA-RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.; ...
2016-07-06
Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.
Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less
The biogenesis pathway of tRNA-derived piRNAs in Bombyx germ cells
Honda, Shozo; Kawamura, Takuya; Loher, Phillipe; Morichika, Keisuke; Rigoutsos, Isidore
2017-01-01
Abstract Transfer RNAs (tRNAs) function in translational machinery and further serves as a source of short non-coding RNAs (ncRNAs). tRNA-derived ncRNAs show differential expression profiles and play roles in many biological processes beyond translation. Molecular mechanisms that shape and regulate their expression profiles are largely unknown. Here, we report the mechanism of biogenesis for tRNA-derived Piwi-interacting RNAs (td-piRNAs) expressed in Bombyx BmN4 cells. In the cells, two cytoplasmic tRNA species, tRNAAspGUC and tRNAHisGUG, served as major sources for td-piRNAs, which were derived from the 5′-part of the respective tRNAs. cP-RNA-seq identified the two tRNAs as major substrates for the 5′-tRNA halves as well, suggesting a previously uncharacterized link between 5′-tRNA halves and td-piRNAs. An increase in levels of the 5′-tRNA halves, induced by BmNSun2 knockdown, enhanced the td-piRNA expression levels without quantitative change in mature tRNAs, indicating that 5′-tRNA halves, not mature tRNAs, are the direct precursors for td-piRNAs. For the generation of tRNAHisGUG-derived piRNAs, BmThg1l-mediated nucleotide addition to −1 position of tRNAHisGUG was required, revealing an important function of BmThg1l in piRNA biogenesis. Our study advances the understanding of biogenesis mechanisms and the genesis of specific expression profiles for tRNA-derived ncRNAs. PMID:28645172
SNP discovery in the bovine milk transcriptome using RNA-Seq technology.
Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F
2010-12-01
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.
Ali, Shahin S.; Shao, Jonathan; Lary, David J.; Strem, Mary D.; Meinhardt, Lyndel W.; Bailey, Bryan A.
2017-01-01
Phytophthora megakarya (Pmeg) and Phytophthora palmivora (Ppal) cause black pod rot of Theobroma cacao L. (cacao). Of these two clade 4 species, Pmeg is more virulent and is displacing Ppal in many cacao production areas in Africa. Symptoms and species specific sporangia production were compared when the two species were co-inoculated onto pod pieces in staggered 24 h time intervals. Pmeg sporangia were predominantly recovered from pod pieces with unwounded surfaces even when inoculated 24 h after Ppal. On wounded surfaces, sporangia of Ppal were predominantly recovered if the two species were simultaneously applied or Ppal was applied first but not if Pmeg was applied first. Pmeg demonstrated an advantage over Ppal when infecting un-wounded surfaces while Ppal had the advantage when infecting wounded surfaces. RNA-Seq was carried out on RNA isolated from control and Pmeg and Ppal infected pod pieces 3 days post inoculation to assess their abilities to alter/suppress cacao defense. Expression of 4,482 and 5,264 cacao genes was altered after Pmeg and Ppal infection, respectively, with most genes responding to both species. Neural network self-organizing map analyses separated the cacao RNA-Seq gene expression profiles into 24 classes, 6 of which were largely induced in response to infection. Using KEGG analysis, subsets of genes composing interrelated pathways leading to phenylpropanoid biosynthesis, ethylene and jasmonic acid biosynthesis and action, plant defense signal transduction, and endocytosis showed induction in response to infection. A large subset of genes encoding putative Pr-proteins also showed differential expression in response to infection. A subset of 36 cacao genes was used to validate the RNA-Seq expression data and compare infection induced gene expression patterns in leaves and wounded and unwounded pod husks. Expression patterns between RNA-Seq and RT-qPCR were generally reproducible. The level and timing of altered gene expression was influenced by the tissues studied and by wounding. Although, in these susceptible interactions gene expression patterns were similar, some genes did show differential expression in a Phytophthora species dependent manner. The biggest difference was the more intense changes in expression in Ppal inoculated wounded pod pieces further demonstrating its rapid progression when penetrating through wounds. PMID:28261234
Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.
Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie
2018-01-01
Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.
Wood, Oliver; Woo, Jeongmin; Seumois, Gregory; Savelyeva, Natalia; McCann, Katy J; Singh, Divya; Jones, Terry; Peel, Lailah; Breen, Michael S; Ward, Matthew; Garrido Martin, Eva; Sanchez-Elsner, Tilman; Thomas, Gareth; Vijayanand, Pandurangan; Woelk, Christopher H; King, Emma; Ottensmeier, Christian
2016-08-30
Human papilloma virus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) has a better prognosis than it's HPV negative (HPV(-)) counterpart. This may be due to the higher numbers of tumor-infiltrating lymphocytes (TILs) in HPV positive (HPV(+)) tumors. RNA-Sequencing (RNA-Seq) was used to evaluate whether the differences in clinical behaviour simply reflect a numerical difference in TILs or whether there is a fundamental behavioural difference between TILs in these two settings. Thirty-nine HNSCC tumors were scored for TIL density by immunohistochemistry. After the removal of 16 TILlow tumors, RNA-Seq analysis was performed on 23 TILhigh/med tumors (HPV(+) n=10 and HPV(-) n=13). Using EdgeR, differentially expressed genes (DEG) were identified. Immune subset analysis was performed using Functional Analysis of Individual RNA-Seq/ Microarray Expression (FAIME) and immune gene RNA transcript count analysis. In total, 1,634 DEGs were identified, with a dominant immune signature observed in HPV(+) tumors. After normalizing the expression profiles to account for differences in B- and T-cell number, 437 significantly DEGs remained. A B-cell associated signature distinguished HPV(+) from HPV(-) tumors, and included the DEGs CD200, GGA2, ADAM28, STAG3, SPIB, VCAM1, BCL2 and ICOSLG; the immune signal relative to T-cells was qualitatively similar between TILs of both tumor cohorts. Our findings were validated and confirmed in two independent cohorts using TCGA data and tumor-infiltrating B-cells from additional HPV(+) HNSCC patients. A B-cell associated signal segregated tumors relative to HPV status. Our data suggests that the role of B-cells in the adaptive immune response to HPV(+) HNSCC requires re-assessment.
Savelyeva, Natalia; McCann, Katy J.; Singh, Divya; Jones, Terry; Peel, Lailah; Breen, Michael S.; Ward, Matthew; Martin, Eva Garrido
2016-01-01
Human papilloma virus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) has a better prognosis than it's HPV negative (HPV(−)) counterpart. This may be due to the higher numbers of tumor-infiltrating lymphocytes (TILs) in HPV positive (HPV(+)) tumors. RNA-Sequencing (RNA-Seq) was used to evaluate whether the differences in clinical behaviour simply reflect a numerical difference in TILs or whether there is a fundamental behavioural difference between TILs in these two settings. Thirty-nine HNSCC tumors were scored for TIL density by immunohistochemistry. After the removal of 16 TILlow tumors, RNA-Seq analysis was performed on 23 TILhigh/med tumors (HPV(+) n=10 and HPV(−) n=13). Using EdgeR, differentially expressed genes (DEG) were identified. Immune subset analysis was performed using Functional Analysis of Individual RNA-Seq/ Microarray Expression (FAIME) and immune gene RNA transcript count analysis. In total, 1,634 DEGs were identified, with a dominant immune signature observed in HPV(+) tumors. After normalizing the expression profiles to account for differences in B- and T-cell number, 437 significantly DEGs remained. A B-cell associated signature distinguished HPV(+) from HPV(−) tumors, and included the DEGs CD200, GGA2, ADAM28, STAG3, SPIB, VCAM1, BCL2 and ICOSLG; the immune signal relative to T-cells was qualitatively similar between TILs of both tumor cohorts. Our findings were validated and confirmed in two independent cohorts using TCGA data and tumor-infiltrating B-cells from additional HPV(+) HNSCC patients. A B-cell associated signal segregated tumors relative to HPV status. Our data suggests that the role of B-cells in the adaptive immune response to HPV(+) HNSCC requires re-assessment. PMID:27462861
Genome-wide mapping of infection-induced SINE RNAs reveals a role in selective mRNA export
Zhao, Yang; Alla, Ravi
2017-01-01
Abstract Short interspersed nuclear elements (SINEs) are retrotransposons evolutionarily derived from endogenous RNA Polymerase III RNAs. Though SINE elements have undergone exaptation into gene regulatory elements, how transcribed SINE RNA impacts transcriptional and post-transcriptional regulation is largely unknown. This is partly due to a lack of information regarding which of the loci have transcriptional potential. Here, we present an approach (short interspersed nuclear element sequencing, SINE-seq), which selectively profiles RNA Polymerase III-derived SINE RNA, thereby identifying transcriptionally active SINE loci. Applying SINE-seq to monitor murine B2 SINE expression during a gammaherpesvirus infection revealed transcription from 28 270 SINE loci, with ∼50% of active SINE elements residing within annotated RNA Polymerase II loci. Furthermore, B2 RNA can form intermolecular RNA–RNA interactions with complementary mRNAs, leading to nuclear retention of the targeted mRNA via a mechanism involving p54nrb. These findings illuminate a pathway for the selective regulation of mRNA export during stress via retrotransposon activation. PMID:28334904
Gene expression distribution deconvolution in single-cell RNA sequencing.
Wang, Jingshu; Huang, Mo; Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Murray, John; Raj, Arjun; Li, Mingyao; Zhang, Nancy R
2018-06-26
Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene's expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND's noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Copyright © 2018 the Author(s). Published by PNAS.
Langevin, Stanley A.; Bent, Zachary W.; Solberg, Owen D.; Curtis, Deanna J.; Lane, Pamela D.; Williams, Kelly P.; Schoeniger, Joseph S.; Sinha, Anupama; Lane, Todd W.; Branda, Steven S.
2013-01-01
Use of second generation sequencing (SGS) technologies for transcriptional profiling (RNA-Seq) has revolutionized transcriptomics, enabling measurement of RNA abundances with unprecedented specificity and sensitivity and the discovery of novel RNA species. Preparation of RNA-Seq libraries requires conversion of the RNA starting material into cDNA flanked by platform-specific adaptor sequences. Each of the published methods and commercial kits currently available for RNA-Seq library preparation suffers from at least one major drawback, including long processing times, large starting material requirements, uneven coverage, loss of strand information and high cost. We report the development of a new RNA-Seq library preparation technique that produces representative, strand-specific RNA-Seq libraries from small amounts of starting material in a fast, simple and cost-effective manner. Additionally, we have developed a new quantitative PCR-based assay for precisely determining the number of PCR cycles to perform for optimal enrichment of the final library, a key step in all SGS library preparation workflows. PMID:23558773
Sun, Zhengda; Wang, Chih-Yang; Lawson, Devon A; Kwek, Serena; Velozo, Hugo Gonzalez; Owyong, Mark; Lai, Ming-Derg; Fong, Lawrence; Wilson, Mark; Su, Hua; Werb, Zena; Cooke, Daniel L
2018-02-16
Tumor endothelial cells (TEC) play an indispensible role in tumor growth and metastasis although much of the detailed mechanism still remains elusive. In this study we characterized and compared the global gene expression profiles of TECs and control ECs isolated from human breast cancerous tissues and reduction mammoplasty tissues respectively by single cell RNA sequencing (scRNA-seq). Based on the qualified scRNA-seq libraries that we made, we found that 1302 genes were differentially expressed between these two EC phenotypes. Both principal component analysis (PCA) and heat map-based hierarchical clustering separated the cancerous versus control ECs as two distinctive clusters, and MetaCore disease biomarker analysis indicated that these differentially expressed genes are highly correlated with breast neoplasm diseases. Gene Set Enrichment Analysis software (GSEA) enriched these genes to extracellular matrix (ECM) signal pathways and highlighted 127 ECM-associated genes. External validation verified some of these ECM-associated genes are not only generally overexpressed in various cancer tissues but also specifically overexpressed in colorectal cancer ECs and lymphoma ECs. In conclusion, our data demonstrated that ECM-associated genes play pivotal roles in breast cancer EC biology and some of them could serve as potential TEC biomarkers for various cancers.
Time Series Expression Analyses Using RNA-seq: A Statistical Approach
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.
Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P
2013-01-01
RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Gene Expression Profiling of Liver Cancer Stem Cells by RNA-Sequencing
Lam, Chi Tat; Ng, Michael N. P.; Yu, Wan Ching; Lau, Joyce; Wan, Timothy; Wang, Xiaoqi; Yan, Zhixiang; Liu, Hang; Fan, Sheung Tat
2012-01-01
Background Accumulating evidence supports that tumor growth and cancer relapse are driven by cancer stem cells. Our previous work has demonstrated the existence of CD90+ liver cancer stem cells (CSCs) in hepatocellular carcinoma (HCC). Nevertheless, the characteristics of these cells are still poorly understood. In this study, we employed a more sensitive RNA-sequencing (RNA-Seq) to compare the gene expression profiling of CD90+ cells sorted from tumor (CD90+CSCs) with parallel non-tumorous liver tissues (CD90+NTSCs) and elucidate the roles of putative target genes in hepatocarcinogenesis. Methodology/Principal Findings CD90+ cells were sorted respectively from tumor and adjacent non-tumorous human liver tissues using fluorescence-activated cell sorting. The amplified RNAs of CD90+ cells from 3 HCC patients were subjected to RNA-Seq analysis. A differential gene expression profile was established between CD90+CSCs and CD90+NTSCs, and validated by quantitative real-time PCR (qRT-PCR) on the same set of amplified RNAs, and further confirmed in an independent cohort of 12 HCC patients. Five hundred genes were differentially expressed (119 up-regulated and 381 down-regulated genes) between CD90+CSCs and CD90+NTSCs. Gene ontology analysis indicated that the over-expressed genes in CD90+CSCs were associated with inflammation, drug resistance and lipid metabolism. Among the differentially expressed genes, glypican-3 (GPC3), a member of glypican family, was markedly elevated in CD90+CSCs compared to CD90+NTSCs. Immunohistochemistry demonstrated that GPC3 was highly expressed in forty-two human liver tumor tissues but absent in adjacent non-tumorous liver tissues. Flow cytometry indicated that GPC3 was highly expressed in liver CD90+CSCs and mature cancer cells in liver cancer cell lines and human liver tumor tissues. Furthermore, GPC3 expression was positively correlated with the number of CD90+CSCs in liver tumor tissues. Conclusions/Significance The identified genes, such as GPC3 that are distinctly expressed in liver CD90+CSCs, may be promising gene candidates for HCC therapy without inducing damages to normal liver stem cells. PMID:22606345
Rong-Mullins, Xiaoqing; Ayers, Michael C.; Summers, Mahmoud; Gallagher, Jennifer E. G.
2017-01-01
Cellular metabolism can change the potency of a chemical’s tumorigenicity. 4-nitroquinoline-1-oxide (4NQO) is a tumorigenic drug widely used on animal models for cancer research. Polymorphisms of the transcription factor Yrr1 confer different levels of resistance to 4NQO in Saccharomyces cerevisiae. To study how different Yrr1 alleles regulate gene expression leading to resistance, transcriptomes of three isogenic S. cerevisiae strains carrying different Yrr1 alleles were profiled via RNA sequencing (RNA-Seq) and chromatin immunoprecipitation coupled with sequencing (ChIP-Seq) in the presence and absence of 4NQO. In response to 4NQO, all alleles of Yrr1 drove the expression of SNQ2 (a multidrug transporter), which was highest in the presence of 4NQO resistance-conferring alleles, and overexpression of SNQ2 alone was sufficient to overcome 4NQO-sensitive growth. Using shape metrics to refine the ChIP-Seq peaks, Yrr1 strongly associated with three loci including SNQ2. In addition to a known Yrr1 target SNG1, Yrr1 also bound upstream of RPL35B; however, overexpression of these genes did not confer 4NQO resistance. RNA-Seq data also implicated nucleotide synthesis pathways including the de novo purine pathway, and the ribonuclease reductase pathways were downregulated in response to 4NQO. Conversion of a 4NQO-sensitive allele to a 4NQO-resistant allele by a single point mutation mimicked the 4NQO-resistant allele in phenotype, and while the 4NQO resistant allele increased the expression of the ADE genes in the de novo purine biosynthetic pathway, the mutant Yrr1 increased expression of ADE genes even in the absence of 4NQO. These same ADE genes were only increased in the wild-type alleles in the presence of 4NQO, indicating that the point mutation activated Yrr1 to upregulate a pathway normally only activated in response to stress. The various Yrr1 alleles also influenced growth on different carbon sources by altering the function of the mitochondria. Hence, the complement to 4NQO resistance was poor growth on nonfermentable carbon sources, which in turn varied depending on the allele of Yrr1 expressed in the isogenic yeast. The oxidation state of the yeast affected the 4NQO toxicity by altering the reactive oxygen species (ROS) generated by cellular metabolism. The integration of RNA-Seq and ChIP-Seq elucidated how Yrr1 regulates global gene transcription in response to 4NQO and how various Yrr1 alleles confer differential resistance to 4NQO. This study provides guidance for further investigation into how Yrr1 regulates cellular responses to 4NQO, as well as transcriptomic resources for further analysis of transcription factor variation on carbon source utilization. PMID:29208650
Kudo, Toru; Sasaki, Yohei; Terashima, Shin; Matsuda-Imai, Noriko; Takano, Tomoyuki; Saito, Misa; Kanno, Maasa; Ozaki, Soichi; Suwabe, Keita; Suzuki, Go; Watanabe, Masao; Matsuoka, Makoto; Takayama, Seiji; Yano, Kentaro
2016-10-13
In quantitative gene expression analysis, normalization using a reference gene as an internal control is frequently performed for appropriate interpretation of the results. Efforts have been devoted to exploring superior novel reference genes using microarray transcriptomic data and to evaluating commonly used reference genes by targeting analysis. However, because the number of specifically detectable genes is totally dependent on probe design in the microarray analysis, exploration using microarray data may miss some of the best choices for the reference genes. Recently emerging RNA sequencing (RNA-seq) provides an ideal resource for comprehensive exploration of reference genes since this method is capable of detecting all expressed genes, in principle including even unknown genes. We report the results of a comprehensive exploration of reference genes using public RNA-seq data from plants such as Arabidopsis thaliana (Arabidopsis), Glycine max (soybean), Solanum lycopersicum (tomato) and Oryza sativa (rice). To select reference genes suitable for the broadest experimental conditions possible, candidates were surveyed by the following four steps: (1) evaluation of the basal expression level of each gene in each experiment; (2) evaluation of the expression stability of each gene in each experiment; (3) evaluation of the expression stability of each gene across the experiments; and (4) selection of top-ranked genes, after ranking according to the number of experiments in which the gene was expressed stably. Employing this procedure, 13, 10, 12 and 21 top candidates for reference genes were proposed in Arabidopsis, soybean, tomato and rice, respectively. Microarray expression data confirmed that the expression of the proposed reference genes under broad experimental conditions was more stable than that of commonly used reference genes. These novel reference genes will be useful for analyzing gene expression profiles across experiments carried out under various experimental conditions.
Hennessy, Rosanna C; Glaring, Mikkel A; Olsson, Stefan; Stougaard, Peter
2017-08-10
Few studies to date report the transcriptional response of biocontrol bacteria toward phytopathogens. In order to gain insights into the potential mechanism underlying the antagonism of the antimicrobial producing strain P. fluorescens In5 against the phytopathogens Rhizoctonia solani and Pythium aphanidermatum, global RNA sequencing was performed. Differential gene expression profiling of P. fluorescens In5 in response to either R. solani or P. aphanidermatum was investigated using transcriptome sequencing (RNA-seq). Total RNA was isolated from single bacterial cultures of P. fluorescens In5 or bacterial cultures in dual-culture for 48 h with each pathogen in biological triplicates. RNA-seq libraries were constructed following a default Illumina stranded RNA protocol including rRNA depletion and were sequenced 2 × 100 bases on Illumina HiSeq generating approximately 10 million reads per sample. No significant changes in global gene expression were recorded during dual-culture of P. fluorescens In5 with any of the two pathogens but rather each pathogen appeared to induce expression of a specific set of genes. A particularly strong transcriptional response to R. solani was observed and notably several genes possibly associated with secondary metabolite detoxification and metabolism were highly upregulated in response to the fungus. A total of 23 genes were significantly upregulated and seven genes were significantly downregulated with at least respectively a threefold change in expression level in response to R. solani compared to the no fungus control. In contrast, only one gene was significantly upregulated over threefold and three transcripts were significantly downregulated over threefold in response to P. aphanidermatum. Genes known to be involved in synthesis of secondary metabolites, e.g. non-ribosomal synthetases and hydrogen cyanide were not differentially expressed at the time points studied. This study demonstrates that genes possibly involved in metabolite detoxification are highly upregulated in P. fluorescens In5 when co-cultured with plant pathogens and in particular the fungus R. solani. This highlights the importance of studying microbe-microbe interactions to gain a better understanding of how different systems function in vitro and ultimately in natural systems where biocontrol agents can be used for the sustainable management of plant diseases.
RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.
Wang, Yejun; Sun, Ming-An; White, Aaron P
2018-01-01
RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .
SC3 - consensus clustering of single-cell RNA-Seq data
Kiselev, Vladimir Yu.; Kirschner, Kristina; Schaub, Michael T.; Andrews, Tallulah; Yiu, Andrew; Chandra, Tamir; Natarajan, Kedar N; Reik, Wolf; Barahona, Mauricio; Green, Anthony R; Hemberg, Martin
2017-01-01
Single-cell RNA-seq (scRNA-seq) enables a quantitative cell-type characterisation based on global transcriptome profiles. We present Single-Cell Consensus Clustering (SC3), a user-friendly tool for unsupervised clustering which achieves high accuracy and robustness by combining multiple clustering solutions through a consensus approach. We demonstrate that SC3 is capable of identifying subclones based on the transcriptomes from neoplastic cells collected from patients. PMID:28346451
Yu, Shijiang; Ding, Lili; Luo, Ren; Li, Xiaojiao; Yang, Juan; Liu, Haoqiang; Cong, Lin; Ran, Chun
2016-01-01
Dialeurodes citri is a major pest in citrus producing areas, and large-scale outbreaks have occurred increasingly often in recent years. Lecanicillium attenuatum is an important entomopathogenic fungus that can parasitize and kill D. citri. We separated the fungus from corpses of D. citri larvae. However, the sound immune defense system of pests makes infection by an entomopathogenic fungus difficult. Here we used RNA sequencing technology (RNA-Seq) to build a transcriptome database for D. citri and performed digital gene expression profiling to screen genes that act in the immune defense of D. citri larvae infected with a pathogenic fungus. De novo assembly generated 84,733 unigenes with mean length of 772 nt. All unigenes were searched against GO, Nr, Swiss-Prot, COG, and KEGG databases and a total of 28,190 (33.3%) unigenes were annotated. We identified 129 immunity-related unigenes in transcriptome database that were related to pattern recognition receptors, information transduction factors and response factors. From the digital gene expression profile, we identified 441 unigenes that were differentially expressed in D. citri infected with L. attenuatum. Through calculated Log2Ratio values, we identified genes for which fold changes in expression were obvious, including cuticle protein, vitellogenin, cathepsin, prophenoloxidase, clip-domain serine protease, lysozyme, and others. Subsequent quantitative real-time polymerase chain reaction analysis verified the results. The identified genes may serve as target genes for microbial control of D. citri.
Yu, Shijiang; Ding, Lili; Luo, Ren; Li, Xiaojiao; Yang, Juan; Liu, Haoqiang; Cong, Lin; Ran, Chun
2016-01-01
Dialeurodes citri is a major pest in citrus producing areas, and large-scale outbreaks have occurred increasingly often in recent years. Lecanicillium attenuatum is an important entomopathogenic fungus that can parasitize and kill D. citri. We separated the fungus from corpses of D. citri larvae. However, the sound immune defense system of pests makes infection by an entomopathogenic fungus difficult. Here we used RNA sequencing technology (RNA-Seq) to build a transcriptome database for D. citri and performed digital gene expression profiling to screen genes that act in the immune defense of D. citri larvae infected with a pathogenic fungus. De novo assembly generated 84,733 unigenes with mean length of 772 nt. All unigenes were searched against GO, Nr, Swiss-Prot, COG, and KEGG databases and a total of 28,190 (33.3%) unigenes were annotated. We identified 129 immunity-related unigenes in transcriptome database that were related to pattern recognition receptors, information transduction factors and response factors. From the digital gene expression profile, we identified 441 unigenes that were differentially expressed in D. citri infected with L. attenuatum. Through calculated Log2Ratio values, we identified genes for which fold changes in expression were obvious, including cuticle protein, vitellogenin, cathepsin, prophenoloxidase, clip-domain serine protease, lysozyme, and others. Subsequent quantitative real-time polymerase chain reaction analysis verified the results. The identified genes may serve as target genes for microbial control of D. citri. PMID:27644092
Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng; Li, Xue-Bao
2017-01-01
Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus.
Huang, Ke-Lin; Zhang, Mei-Li; Ma, Guang-Jing; Wu, Huan; Wu, Xiao-Ming; Ren, Feng
2017-01-01
Seed oil content is an important agronomic trait in oilseed rape. However, the molecular mechanism of oil accumulation in rapeseeds is unclear so far. In this report, RNA sequencing technique (RNA-Seq) was performed to explore differentially expressed genes in siliques of two Brassica napus lines (HFA and LFA which contain high and low oil contents in seeds, respectively) at 15 and 25 days after pollination (DAP). The RNA-Seq results showed that 65746 and 66033 genes were detected in siliques of low oil content line at 15 and 25 DAP, and 65236 and 65211 genes were detected in siliques of high oil content line at 15 and 25 DAP, respectively. By comparative analysis, the differentially expressed genes (DEGs) were identified in siliques of these lines. The DEGs were involved in multiple pathways, including metabolic pathways, biosynthesis of secondary metabolic, photosynthesis, pyruvate metabolism, fatty metabolism, glycophospholipid metabolism, and DNA binding. Also, DEGs were related to photosynthesis, starch and sugar metabolism, pyruvate metabolism, and lipid metabolism at different developmental stage, resulting in the differential oil accumulation in seeds. Furthermore, RNA-Seq and qRT-PCR data revealed that some transcription factors positively regulate seed oil content. Thus, our data provide the valuable information for further exploring the molecular mechanism of lipid biosynthesis and oil accumulation in B. nupus. PMID:28594951
Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling
2015-03-01
Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Introduction to Single-Cell RNA Sequencing.
Olsen, Thale Kristin; Baryawno, Ninib
2018-04-01
During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein
2013-01-01
RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.
Zheng, Min; Lu, Jianguo; Zhao, Dongye
2018-05-24
Increasing utilization of stabilized iron sulfides (FeS) nanoparticles implies an elevated release of the materials into the environment. To understand potential impacts and underlying mechanisms of nanoparticle-induced stress, we used the transcriptome sequencing (RNA-seq) technique to characterize the transcriptomes from adult zebrafish exposed to 10 mg/L carboxymethyl cellulose (CMC) stabilized FeS nanoparticles for 96 h, demonstrating striking differences in the gene expression profiles in liver. The exposure caused significant expression alterations in genes related to immune and inflammatory responses, detoxification, oxidative stress and DNA damage/repair. The complement and coagulation cascades Kyoto encyclopedia of genes and genomes (KEGG) pathway was found significantly up-regulated under nanoparticle exposure. The quantitative real-time polymerase chain reaction using twelve genes confirmed the RNA-seq results. We identified several candidate genes commonly regulated in liver, which may serve as gene indicators when exposed to the nanoparticles. Hepatic inflammation was further confirmed by histological observation of pyknotic nuclei, and vacuole formation upon exposure. Tissue accumulation tests showed a 2.2 times higher iron concentration in the fish tissue upon exposure. This study provides preliminary mechanistic insights into potential toxic effects of organic matter stabilized FeS nanoparticles, which will improve our understanding of the genotoxicity caused by stabilized nanoparticles.
Li, Shan; Dong, Xia; Su, Zhengchang
2013-07-30
Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads.
2013-01-01
Background Although prokaryotic gene transcription has been studied over decades, many aspects of the process remain poorly understood. Particularly, recent studies have revealed that transcriptomes in many prokaryotes are far more complex than previously thought. Genes in an operon are often alternatively and dynamically transcribed under different conditions, and a large portion of genes and intergenic regions have antisense RNA (asRNA) and non-coding RNA (ncRNA) transcripts, respectively. Ironically, similar studies have not been conducted in the model bacterium E coli K12, thus it is unknown whether or not the bacterium possesses similar complex transcriptomes. Furthermore, although RNA-seq becomes the major method for analyzing the complexity of prokaryotic transcriptome, it is still a challenging task to accurately assemble full length transcripts using short RNA-seq reads. Results To fill these gaps, we have profiled the transcriptomes of E. coli K12 under different culture conditions and growth phases using a highly specific directional RNA-seq technique that can capture various types of transcripts in the bacterial cells, combined with a highly accurate and robust algorithm and tool TruHMM (http://bioinfolab.uncc.edu/TruHmm_package/) for assembling full length transcripts. We found that 46.9 ~ 63.4% of expressed operons were utilized in their putative alternative forms, 72.23 ~ 89.54% genes had putative asRNA transcripts and 51.37 ~ 72.74% intergenic regions had putative ncRNA transcripts under different culture conditions and growth phases. Conclusions As has been demonstrated in many other prokaryotes, E. coli K12 also has a highly complex and dynamic transcriptomes under different culture conditions and growth phases. Such complex and dynamic transcriptomes might play important roles in the physiology of the bacterium. TruHMM is a highly accurate and robust algorithm for assembling full-length transcripts in prokaryotes using directional RNA-seq short reads. PMID:23899370
Pisarska, Margareta D; Akhlaghpour, Marzieh; Lee, Bora; Barlow, Gillian M; Xu, Ning; Wang, Erica T; Mackey, Aaron J; Farber, Charles R; Rich, Stephen S; Rotter, Jerome I; Chen, Yii-der I; Goodarzi, Mark O; Guller, Seth; Williams, John
2016-11-01
Multiple testing to understand global changes in gene expression based on genetic and epigenetic modifications is evolving. Chorionic villi, obtained for prenatal testing, is limited, but can be used to understand ongoing human pregnancies. However, optimal storage, processing and utilization of CVS for multiple platform testing have not been established. Leftover CVS samples were flash-frozen or preserved in RNAlater. Modifications to standard isolation kits were performed to isolate quality DNA and RNA from samples as small as 2-5 mg. RNAlater samples had significantly higher RNA yields and quality and were successfully used in microarray and RNA-sequencing (RNA-seq). RNA-seq libraries generated using 200 versus 800-ng RNA showed similar biological coefficients of variation. RNAlater samples had lower DNA yields and quality, which improved by heating the elution buffer to 70 °C. Purification of DNA was not necessary for bisulfite-conversion and genome-wide methylation profiling. CVS cells were propagated and continue to express genes found in freshly isolated chorionic villi. CVS samples preserved in RNAlater are superior. Our optimized techniques provide specimens for genetic, epigenetic and gene expression studies from a single small sample which can be used to develop diagnostics and treatments using a systems biology approach in the prenatal period. © 2016 John Wiley & Sons, Ltd. © 2016 John Wiley & Sons, Ltd.
Tu, Ying; Xu, Dan; Feng, Jiaqi; He, Li
2017-01-01
Sensitive skin (SS) is a condition of subjective cutaneous hyper-reactivity. The role of long non-coding RNAs (lncRNAs) in subjects with SS is unclear. Therefore, the aim of the present study was to provide a comprehensive profile of the mRNAs and lncRNAs in subjects with SS. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis presented the characteristics of associated protein-coding genes. In addition, a co-expression network of lncRNA and mRNA was constructed to identify potential underlying regulation targets; the results were verified by quantitative real-time PCR (qRT-PCR) and RNA-seq analyses in patients with SS and normal samples. Compared with the normal skin group, 266 novel lncRNAs and 6750 annotated lncRNAs were identified in the SS group. A total of 71 lncRNA transcripts and 2615 mRNA transcripts were differentially expressed (P < 0.05). The heat signature of the SS samples could be distinguished from the normal skin samples, whereas the majority of the genes that were present in enriched pathways were those that participated in focal adhesion, PI3K-Akt signaling, and cancer-related pathways. Five transcripts were selected for qRT-PCR analysis and the results were consistent with RNA-seq. The results suggested that LNC_000265 may play a role in the epidermal barrier structure of patient with SS. The data suggest novel genes and pathways that may be involved in the pathogenesis of SS and highlight potential targets that could be used for individualized treatment applications. PMID:29383128
Quantification of differential gene expression by multiplexed targeted resequencing of cDNA
Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.
2017-01-01
Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677
IAOseq: inferring abundance of overlapping genes using RNA-seq data.
Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue
2015-01-01
Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.
Single cell RNA Seq reveals dynamic paracrine control of cellular variation
Shalek, Alex K.; Satija, Rahul; Shuga, Joe; Trombetta, John J.; Gennert, Dave; Lu, Diana; Chen, Peilin; Gertner, Rona S.; Gaublomme, Jellert T.; Yosef, Nir; Schwartz, Schraga; Fowler, Brian; Weaver, Suzanne; Wang, Jing; Wang, Xiaohui; Ding, Ruihua; Raychowdhury, Raktima; Friedman, Nir; Hacohen, Nir; Park, Hongkun; May, Andrew P.; Regev, Aviv
2014-01-01
High-throughput single-cell transcriptomics offers an unbiased approach for understanding the extent, basis, and function of gene expression variation between seemingly identical cells. Here, we sequence single-cell RNA-Seq libraries prepared from over 1,700 primary mouse bone marrow derived dendritic cells (DCs) spanning several experimental conditions. We find substantial variation between identically stimulated DCs, in both the fraction of cells detectably expressing a given mRNA and the transcript’s level within expressing cells. Distinct gene modules are characterized by different temporal heterogeneity profiles. In particular, a “core” module of antiviral genes is expressed very early by a few “precocious” cells, but is later activated in all cells. By stimulating cells individually in sealed microfluidic chambers, analyzing DCs from knockout mice, and modulating secretion and extracellular signaling, we show that this response is coordinated via interferon-mediated paracrine signaling. Surprisingly, preventing cell-to-cell communication also substantially reduces variability in the expression of an early-induced “peaked” inflammatory module, suggesting that paracrine signaling additionally represses part of the inflammatory program. Our study highlights the importance of cell-to-cell communication in controlling cellular heterogeneity and reveals general strategies that multicellular populations use to establish complex dynamic responses. PMID:24919153
Comparison of software packages for detecting differential expression in RNA-seq studies
Seyednasrollah, Fatemeh; Laiho, Asta
2015-01-01
RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110
Comparison of software packages for detecting differential expression in RNA-seq studies.
Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L
2015-01-01
RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.
D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano
2015-01-01
The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.
Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R
2017-11-02
Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong
2017-01-01
Abstract Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. PMID:29036714
Variation-preserving normalization unveils blind spots in gene expression profiling
Roca, Carlos P.; Gomes, Susana I. L.; Amorim, Mónica J. B.; Scott-Fordsmand, Janeck J.
2017-01-01
RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, but lack of reproducibility has hindered their application. A key challenge in the data analysis is the normalization of gene expression levels, which is currently performed following the implicit assumption that most genes are not differentially expressed. Here, we present a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much larger than currently believed, and that it can be measured with available assays. Our results also explain, at least partially, the reproducibility problems encountered in transcriptomics studies. We expect that this improvement in detection will help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression. PMID:28276435
Berardocco, Martina; Radeghieri, Annalisa; Busatto, Sara; Gallorini, Marialucia; Raggi, Chiara; Gissi, Clarissa; D'Agnano, Igea; Bergese, Paolo; Felsani, Armando; Berardi, Anna C
2017-10-10
Liver cancer (LC) is one of the most common cancers and represents the third highest cause of cancer-related deaths worldwide. Extracellular vesicle (EVs) cargoes, which are selectively enriched in RNA, offer great promise for the diagnosis, prognosis and treatment of LC. Our study analyzed the RNA cargoes of EVs derived from 4 liver-cancer cell lines: HuH7, Hep3B, HepG2 (hepato-cellular carcinoma) and HuH6 (hepatoblastoma), generating two different sets of sequencing libraries for each. One library was size-selected for small RNAs and the other targeted the whole transcriptome. Here are reported genome wide data of the expression level of coding and non-coding transcripts, microRNAs, isomiRs and snoRNAs providing the first comprehensive overview of the extracellular-vesicle RNA cargo released from LC cell lines. The EV-RNA expression profiles of the four liver cancer cell lines share a similar background, but cell-specific features clearly emerge showing the marked heterogeneity of the EV-cargo among the individual cell lines, evident both for the coding and non-coding RNA species.
Structure-seq2: sensitive and accurate genome-wide profiling of RNA structure in vivo
Ritchey, Laura E.; Su, Zhao; Tang, Yin; Tack, David C.
2017-01-01
Abstract RNA serves many functions in biology such as splicing, temperature sensing, and innate immunity. These functions are often determined by the structure of RNA. There is thus a pressing need to understand RNA structure and how it changes during diverse biological processes both in vivo and genome-wide. Here, we present Structure-seq2, which provides nucleotide-resolution RNA structural information in vivo and genome-wide. This optimized version of our original Structure-seq method increases sensitivity by at least 4-fold and improves data quality by minimizing formation of a deleterious by-product, reducing ligation bias, and improving read coverage. We also present a variation of Structure-seq2 in which a biotinylated nucleotide is incorporated during reverse transcription, which greatly facilitates the protocol by eliminating two PAGE purification steps. We benchmark Structure-seq2 on both mRNA and rRNA structure in rice (Oryza sativa). We demonstrate that Structure-seq2 can lead to new biological insights. Our Structure-seq2 datasets uncover hidden breaks in chloroplast rRNA and identify a previously unreported N1-methyladenosine (m1A) in a nuclear-encoded Oryza sativa rRNA. Overall, Structure-seq2 is a rapid, sensitive, and unbiased method to probe RNA in vivo and genome-wide that facilitates new insights into RNA biology. PMID:28637286
Tye, Coralee E; Boyd, Joseph R; Page, Natalie A; Falcone, Michelle M; Stein, Janet L; Stein, Gary S; Lian, Jane B
2018-12-01
Long noncoding RNAs (lncRNAs) have recently emerged as novel regulators of lineage commitment, differentiation, development, viability, and disease progression. Few studies have examined their role in osteogenesis; however, given their critical and wide-ranging roles in other tissues, lncRNAs are most likely vital regulators of osteogenesis. In this study, we extensively characterized lncRNA expression in mesenchymal cells during commitment and differentiation to the osteoblast lineage using a whole transcriptome sequencing approach (RNA-Seq). Using mouse primary mesenchymal stromal cells (mMSC), we identified 1438 annotated lncRNAs expressed during MSC differentiation, 462 of which are differentially expressed. We performed guilt-by-association analysis using lncRNA and mRNA expression profiles to identify lncRNAs influencing MSC commitment and differentiation. These findings open novel dimensions for exploring lncRNAs in regulating normal bone formation and in skeletal disorders.
Qiao, Yan; Zhang, Jinjin; Zhang, Jinwen; Wang, Zhiwei; Ran, An; Guo, Haixia; Wang, Di; Zhang, Junlian
2017-02-01
Light is a major environmental factor that affects metabolic pathways and stimulates the production of secondary metabolites in potato. However, adaptive changes in potato metabolic pathways and physiological functions triggered by light are partly explained by gene expression changes. Regulation of secondary metabolic pathways in potato has been extensively studied at transcriptional level, but little is known about the mechanisms of post-transcriptional regulation by miRNAs. To identify light-responsive miRNAs/mRNAs and construct putative metabolism pathways regulated by the miRNA-mRNA pairs, an integrated omics (sRNAome and transcriptome) analysis was performed to potato under light stimulus. A total of 31 and 48 miRNAs were identified to be differentially expressed in the leaves and tubers, respectively. Among the DEGs, 1353 genes in the leaves and 1841 genes in the tubers were upregulated, while 1595 genes in the leaves and 897 genes in the tubers were downregulated by light. Mapman enrichment analyses showed that genes related to MVA pathway, alkaloids-like, phenylpropanoids, flavonoids, and carotenoids metabolism were significantly upregulated, while genes associated with major CHO metabolism were repressed in the leaves and tubers. Integrated miRNA and mRNA profiles revealed that light-responsive miRNAs are important regulators in alkaloids metabolism, UMP salvage, lipid biosynthesis, and cellulose catabolism. Moreover, several miRNAs may participate in glycoalkaloids metabolism via JA signaling pathway, UDP-glucose biosynthesis and hydroxylation reaction. This study provides a global view of miRNA and mRNA expression profiles in potato response to light, our results suggest that miRNAs might play important roles in secondary metabolic pathways, especially in glycoalkaloid biosynthesis. The findings will enlighten us on the genetic regulation of secondary metabolite pathways and pave the way for future application of genetically engineered potato.
Patino, Luz Helena; Ramírez, Juan David
2017-04-01
The kinetoplastids include a large number of parasites responsible for serious diseases in humans and animals (Leishmania and Trypanosoma brucei) considered endemic in several regions of the world. These parasites are characterized by digenetic life cycles that undergo morphological and genetic changes that allow them to adapt to different microenvironments on their vertebrates and invertebrates hosts. Recent advances in ´omics´ technology, specifically transcriptomics have allowed to reveal aspects associated with such molecular changes. So far, different techniques have been used to evaluate the gene expression profile during the various stages of the life cycle of these parasites and during the host-parasite interactions. However, some of them have serious drawbacks that limit the precise study and full understanding of their transcriptomes. Therefore, recently has been implemented the latest technology (RNA-seq), which overcomes the drawbacks of traditional methods. In this review, studies that so far have used RNA-seq are presented and allowed to expand our knowledge regarding the biology of these parasites and their interactions with their hosts. Copyright © 2017 Elsevier B.V. All rights reserved.
McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.
2013-01-01
Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943
Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona
2014-01-01
The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478
Wolff, Alexander; Bayerlová, Michaela; Gaedcke, Jochen; Kube, Dieter; Beißbarth, Tim
2018-01-01
Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.
Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T
2017-12-28
In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .
Krause, Sue A; Pandit, Aniruddha; Davies, Shireen A
2018-01-01
Abstract FlyAtlas 2 (www.flyatlas2.org) is part successor, part complement to the FlyAtlas database and web application for studying the expression of the genes of Drosophila melanogaster in different tissues of adults and larvae. Although generated in the same lab with the same fly line raised on the same diet as FlyAtlas, the FlyAtlas2 resource employs a completely new set of expression data based on RNA-Seq, rather than microarray analysis, and so it allows the user to obtain information for the expression of different transcripts of a gene. Furthermore, the data for somatic tissues are now available for both male and female adult flies, allowing studies of sexual dimorphism. Gene coverage has been extended by the inclusion of microRNAs and many of the RNA genes included in Release 6 of the Drosophila reference genome. The web interface has been modified to accommodate the extra data, but at the same time has been adapted for viewing on small mobile devices. Users also have access to the RNA-Seq reads displayed alongside the annotated Drosophila genome in the (external) UCSC browser, and are able to link out to the previous FlyAtlas resource to compare the data obtained by RNA-Seq with that obtained using microarrays. PMID:29069479
Salinas, Yasmmyn D.; Shi, YiJun; Greenwood, Michael; Hoe, See Ziau; Murphy, David; Gainer, Harold
2015-01-01
Magnocellular neurons (MCNs) in the hypothalamo-neurohypophysial system (HNS) are highly specialized to release large amounts of arginine vasopressin (Avp) or oxytocin (Oxt) into the blood stream and play critical roles in the regulation of body fluid homeostasis. The MCNs are osmosensory neurons and are excited by exposure to hypertonic solutions and inhibited by hypotonic solutions. The MCNs respond to systemic hypertonic and hypotonic stimulation with large changes in the expression of their Avp and Oxt genes, and microarray studies have shown that these osmotic perturbations also cause large changes in global gene expression in the HNS. In this paper, we examine gene expression in the rat supraoptic nucleus (SON) under normosmotic and chronic salt-loading SL) conditions by the first time using “new-generation”, RNA sequencing (RNA-Seq) methods. We reliably detect 9,709 genes as present in the SON by RNA-Seq, and 552 of these genes were changed in expression as a result of chronic SL. These genes reflect diverse functions, and 42 of these are involved in either transcriptional or translational processes. In addition, we compare the SON transcriptomes resolved by RNA-Seq methods with the SON transcriptomes determined by Affymetrix microarray methods in rats under the same osmotic conditions, and find that there are 6,466 genes present in the SON that are represented in both data sets, although 1,040 of the expressed genes were found only in the microarray data, and 2,762 of the expressed genes are selectively found in the RNA-Seq data and not the microarray data. These data provide the research community a comprehensive view of the transcriptome in the SON under normosmotic conditions and the changes in specific gene expression evoked by salt loading. PMID:25897513
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.
Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten
2015-01-01
The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
Experimental Design and Power Calculation for RNA-seq Experiments.
Wu, Zhijin; Wu, Hao
2016-01-01
Power calculation is a critical component of RNA-seq experimental design. The flexibility of RNA-seq experiment and the wide dynamic range of transcription it measures make it an attractive technology for whole transcriptome analysis. These features, in addition to the high dimensionality of RNA-seq data, bring complexity in experimental design, making an analytical power calculation no longer realistic. In this chapter we review the major factors that influence the statistical power of detecting differential expression, and give examples of power assessment using the R package PROPER.
Su, Song; Liu, Jiang; He, Kai; Zhang, Mengyu; Feng, Chunhong; Peng, Fangyi; Li, Bo; Xia, Xianming
2016-04-01
Hepatic injury provoked by cold storage is a major problem affecting liver transplantation, as exposure to cold induces apoptosis in hepatic tissues. Long noncoding RNAs (lncRNAs) are increasingly understood to regulate apoptosis, but the contribution of lncRNAs to cold-induced liver injury remains unknown. Using RNA-seq, we determined the differential lncRNA expression profile in mouse livers after cold storage and found that expression of the lncRNA TUG1 was significantly down-regulated. Overexpression of TUG1 attenuated cold-induced apoptosis in mouse hepatocytes and liver sinusoidal endothelial cells LSECs, in part by blocking mitochondrial apoptosis and endoplasmic reticulum (ER) stress pathways. Moreover, TUG1 attenuated apoptosis, inflammation, and oxidative stress in vivo in livers subjected to cold storage. Overexpression of TUG1 also improved hepatocyte function and prolonged hepatic graft survival rates in mice. These results suggest that the lncRNA TUG1 exerts a protective effect against cold-induced liver damage by inhibiting apoptosis in mice, and suggests a potential role for TUG1 as a target for the prevention of cold-induced liver damage in liver transplantation. RNA-seq data are available from GEO using accession number GSE76609. © 2016 Federation of European Biochemical Societies.
Cell fixation and preservation for droplet-based single-cell transcriptomics.
Alles, Jonathan; Karaiskos, Nikos; Praktiknjo, Samantha D; Grosswendt, Stefanie; Wahle, Philipp; Ruffault, Pierre-Louis; Ayoub, Salah; Schreyer, Luisa; Boltengagen, Anastasiya; Birchmeier, Carmen; Zinzen, Robert; Kocks, Christine; Rajewsky, Nikolaus
2017-05-19
Recent developments in droplet-based microfluidics allow the transcriptional profiling of thousands of individual cells in a quantitative, highly parallel and cost-effective way. A critical, often limiting step is the preparation of cells in an unperturbed state, not altered by stress or ageing. Other challenges are rare cells that need to be collected over several days or samples prepared at different times or locations. Here, we used chemical fixation to address these problems. Methanol fixation allowed us to stabilise and preserve dissociated cells for weeks without compromising single-cell RNA sequencing data. By using mixtures of fixed, cultured human and mouse cells, we first showed that individual transcriptomes could be confidently assigned to one of the two species. Single-cell gene expression from live and fixed samples correlated well with bulk mRNA-seq data. We then applied methanol fixation to transcriptionally profile primary cells from dissociated, complex tissues. Low RNA content cells from Drosophila embryos, as well as mouse hindbrain and cerebellum cells prepared by fluorescence-activated cell sorting, were successfully analysed after fixation, storage and single-cell droplet RNA-seq. We were able to identify diverse cell populations, including neuronal subtypes. As an additional resource, we provide 'dropbead', an R package for exploratory data analysis, visualization and filtering of Drop-seq data. We expect that the availability of a simple cell fixation method will open up many new opportunities in diverse biological contexts to analyse transcriptional dynamics at single-cell resolution.
Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...
2015-03-12
The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less
A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics
House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.
2017-01-01
Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.
Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J
2015-11-15
High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment
Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.
2015-01-01
Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307
DrImpute: imputing dropout events in single cell RNA sequencing data.
Gong, Wuming; Kwak, Il-Youp; Pota, Pruthvi; Koyano-Nakagawa, Naoko; Garry, Daniel J
2018-06-08
The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets. DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .
Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho
2015-10-28
Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing.
Marinov, Georgi K; Williams, Brian A; McCue, Ken; Schroth, Gary P; Gertz, Jason; Myers, Richard M; Wold, Barbara J
2014-03-01
Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.
CircRNAs in the tree shrew (Tupaia belangeri) brain during postnatal development and aging.
Lu, CaiXia; Sun, XiaoMei; Li, Na; Wang, WenGuang; Kuang, DeXuan; Tong, PinFen; Han, YuanYuan; Dai, JieJie
2018-04-30
Circular RNAs (circRNAs) are a novel type of non-coding RNA expressed across different species and tissues. At present, little is known about the expression and function of circRNAs in the tree shrew brain. In this study, we used RNA-seq to identify 35,007 circRNAs in hippocampus and cerebellum samples from infant (aged 47-52 days), young (aged 15-18 months), and old (aged 78-86 months) tree shrews. We observed no significant changes in the total circRNA expression profiles in different brain regions over time. However, circRNA tended to be downregulated in the cerebellum over time. Real-time RT-PCR analysis verified the presence of circRNAs. KEGG analysis indicated the occurrence of ubiquitin-mediated proteolysis, the MAPK signaling pathway, phosphatidylinositol signaling system, long-term depression, the rap1 signaling pathway, and long-term potentiation in both brain regions. We also observed that 29,087 (83.1%) tree shrew circRNAs shared homology with human circRNAs. The competing endogenous RNA networks suggested novel_circRNA_007362 potential functions as a 24-miRNAs sponge to regulate UBE4B expression. Thus, we obtained comprehensive circRNA expression profiles in the tree shrew brain during postnatal development and aging, which might help to elucidate the functions of circRNAs during brain aging and in age-related diseases.
Vignali, Marissa; Armour, Christopher D; Chen, Jingyang; Morrison, Robert; Castle, John C; Biery, Matthew C; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K; Duffy, Patrick E
2011-03-01
Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases.
Vignali, Marissa; Armour, Christopher D.; Chen, Jingyang; Morrison, Robert; Castle, John C.; Biery, Matthew C.; Bouzek, Heather; Moon, Wonjong; Babak, Tomas; Fried, Michal; Raymond, Christopher K.; Duffy, Patrick E.
2011-01-01
Malaria caused by Plasmodium falciparum results in approximately 1 million annual deaths worldwide, with young children and pregnant mothers at highest risk. Disease severity might be related to parasite virulence factors, but expression profiling studies of parasites to test this hypothesis have been hindered by extensive sequence variation in putative virulence genes and a preponderance of host RNA in clinical samples. We report here the application of RNA sequencing to clinical isolates of P. falciparum, using not-so-random (NSR) primers to successfully exclude human ribosomal RNA and globin transcripts and enrich for parasite transcripts. Using NSR-seq, we confirmed earlier microarray studies showing upregulation of a distinct subset of genes in parasites infecting pregnant women, including that encoding the well-established pregnancy malaria vaccine candidate var2csa. We also describe a subset of parasite transcripts that distinguished parasites infecting children from those infecting pregnant women and confirmed this observation using quantitative real-time PCR and mass spectrometry proteomic analyses. Based on their putative functional properties, we propose that these proteins could have a role in childhood malaria pathogenesis. Our study provides proof of principle that NSR-seq represents an approach that can be used to study clinical isolates of parasites causing severe malaria syndromes as well other blood-borne pathogens and blood-related diseases. PMID:21317536
Zhang, Jihong; Zeng, Li; Chen, Shaoyang; Sun, Helong; Ma, Shuang
2018-05-01
Salinity stress can impede development and plant growth adversely. However, there is very little molecular information on NaCl resistance and volatile emissions in Lycopersicum esculentum. In order to investigate the effects of salt stress on the release of volatile compounds, we quantified and compared transcriptome changes by RNA-Seq analysis and volatile constituents with gas chromatography/mass spectrometry (GC/MS) coupled with solid-phase microextraction (SPME) after exposure to continuous salt stress. Chemical analysis by GC-MS analysis revealed that NaCl stress had changed species and quantity of volatile compounds released. In this research, 21,578 unigenes that represented 44,714 assembled unique transcripts were separated from tomato leaves exposed to NaCl stress based on de novo transcriptome assembly. The total number of differentially expressed genes was 7210 after exposure to NaCl, including 6200 down-regulated and 1208 up-regulated genes. Among these differentially expressed genes (DEGs), there were eighteen differentially expressed genes associated with volatile biosynthesis. Of the unigenes, 3454 were mapped to 131 KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, mainly those are involved in RNA transport, plant-pathogen interactions, and plant hormone signal transduction. qRT-PCR analysis showed that NaCl exposure affected the expression profiles of the biosynthesis genes for eight volatile compounds (IPI, GPS, and TPS, etc.), which corresponded well with the RNA-Seq analysis and GC-MS results. Our results suggest that NaCl stress affects the emission of volatile substances from L. esculentum leaves by regulating the expression of genes that are involved in volatile organic compounds' biosynthesis. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Yang, Chuanping; Wei, Hairong
2015-02-01
Microarray and RNA-seq experiments have become an important part of modern genomics and systems biology. Obtaining meaningful biological data from these experiments is an arduous task that demands close attention to many details. Negligence at any step can lead to gene expression data containing inadequate or composite information that is recalcitrant for pattern extraction. Therefore, it is imperative to carefully consider experimental design before launching a time-consuming and costly experiment. Contemporarily, most genomics experiments have two objectives: (1) to generate two or more groups of comparable data for identifying differentially expressed genes, gene families, biological processes, or metabolic pathways under experimental conditions; (2) to build local gene regulatory networks and identify hierarchically important regulators governing biological processes and pathways of interest. Since the first objective aims to identify the active molecular identities and the second provides a basis for understanding the underlying molecular mechanisms through inferring causality relationships mediated by treatment, an optimal experiment is to produce biologically relevant and extractable data to meet both objectives without substantially increasing the cost. This review discusses the major issues that researchers commonly face when embarking on microarray or RNA-seq experiments and summarizes important aspects of experimental design, which aim to help researchers deliberate how to generate gene expression profiles with low background noise but with more interaction to facilitate novel biological discoveries in modern plant genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Łabaj, Paweł P; Leparc, Germán G; Linggi, Bryan E; Markillie, Lye Meng; Wiley, H Steven; Kreil, David P
2011-07-01
Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at
Cho, Jin-Hyung; Huang, Ben S.; Gray, Jesse M.
2016-01-01
The stable formation of remote fear memories is thought to require neuronal gene induction in cortical ensembles that are activated during learning. However, the set of genes expressed specifically in these activated ensembles is not known; knowledge of such transcriptional profiles may offer insights into the molecular program underlying stable memory formation. Here we use RNA-Seq to identify genes whose expression is enriched in activated cortical ensembles labeled during associative fear learning. We first establish that mouse temporal association cortex (TeA) is required for remote recall of auditory fear memories. We then perform RNA-Seq in TeA neurons that are labeled by the activity reporter Arc-dVenus during learning. We identify 944 genes with enriched expression in Arc-dVenus+ neurons. These genes include markers of L2/3, L5b, and L6 excitatory neurons but not glial or inhibitory markers, confirming Arc-dVenus to be an excitatory neuron-specific but non-layer-specific activity reporter. Cross comparisons to other transcriptional profiles show that 125 of the enriched genes are also activity-regulated in vitro or induced by visual stimulus in the visual cortex, suggesting that they may be induced generally in the cortex in an experience-dependent fashion. Prominent among the enriched genes are those encoding potassium channels that down-regulate neuronal activity, suggesting the possibility that part of the molecular program induced by fear conditioning may initiate homeostatic plasticity. PMID:27557751
Study of formation of green eggshell color in ducks through global gene expression.
Xu, Fa Qiong; Li, Ang; Lan, Jing Jing; Wang, Yue Ming; Yan, Mei Jiao; Lian, Sen Yang; Wu, Xu
2018-01-01
The green eggshell color produced by ducks is a threshold trait that can be influenced by various factors, such as hereditary, environment and nutrition. The aim of this study was to investigate the genetic regulation of the formation of eggs with green shells in Youxian ducks. We performed integrative analysis of mRNAs and miRNAs expression profiling in the shell gland samples from ducks by RNA-Seq. We found 124 differentially expressed genes that were associated with various pathways, such as the ATP-binding cassette (ABC) transporter and solute carrier supper family pathways. A total of 31 differentially expressed miRNAs were found between ducks laying green eggs and white eggs. KEGG pathway analysis of the predicted miRNA target genes also indicated the functional characteristics of these miRNAs; they were involved in the ABC transporter pathway and the solute carrier (SLC) supper family. Analysis with qRT-PCR was applied to validate the results of global gene expression, which showed a correlation between results obtained by RNA-seq and RT-qPCR. Moreover, a miRNA-mRNA interaction network was established using correlation analysis of differentially expressed mRNA and miRNA. Compared to ducks that lay white eggs, ducks that lay green eggs include six up-regulated miRNAs that had regulatory effects on 35 down-regulated genes, and seven down-regulated miRNAs which influenced 46 up-regulated genes. For example, the ABC transporter pathway could be regulated by expressing gga-miR-144-3p (up-regulated) with ABCG2 (up-regulated) and other miRNAs and genes. This study provides valuable information about mRNA and miRNA regulation in duck shell gland tissues, and provides foundational information for further study on the eggshell color formation and marker-assisted selection for Youxian duck breeding.
Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji
2018-01-01
Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738
MIDAS: Mining differentially activated subpaths of KEGG pathways from multi-class RNA-seq data.
Lee, Sangseon; Park, Youngjune; Kim, Sun
2017-07-15
Pathway based analysis of high throughput transcriptome data is a widely used approach to investigate biological mechanisms. Since a pathway consists of multiple functions, the recent approach is to determine condition specific sub-pathways or subpaths. However, there are several challenges. First, few existing methods utilize explicit gene expression information from RNA-seq. More importantly, subpath activity is usually an average of statistical scores, e.g., correlations, of edges in a candidate subpath, which fails to reflect gene expression quantity information. In addition, none of existing methods can handle multiple phenotypes. To address these technical problems, we designed and implemented an algorithm, MIDAS, that determines condition specific subpaths, each of which has different activities across multiple phenotypes. MIDAS utilizes gene expression quantity information fully and the network centrality information to determine condition specific subpaths. To test performance of our tool, we used TCGA breast cancer RNA-seq gene expression profiles with five molecular subtypes. 36 differentially activate subpaths were determined. The utility of our method, MIDAS, was demonstrated in four ways. All 36 subpaths are well supported by the literature information. Subsequently, we showed that these subpaths had a good discriminant power for five cancer subtype classification and also had a prognostic power in terms of survival analysis. Finally, in a performance comparison of MIDAS to a recent subpath prediction method, PATHOME, our method identified more subpaths and much more genes that are well supported by the literature information. http://biohealth.snu.ac.kr/software/MIDAS/. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Transcriptomic Profiling Analysis of Arabidopsis thaliana Treated with Exogenous Myo-Inositol
Ye, Wenxing; Ren, Weibo; Kong, Lingqi; Zhang, Wanjun; Wang, Tao
2016-01-01
Myo-insositol (MI) is a crucial substance in the growth and developmental processes in plants. It is commonly added to the culture medium to promote adventitious shoot development. In our previous work, MI was found in influencing Agrobacterium-mediated transformation. In this report, a high-throughput RNA sequencing technique (RNA-Seq) was used to investigate differently expressed genes in one-month-old Arabidopsis seedling grown on MI free or MI supplemented culture medium. The results showed that 21,288 and 21,299 genes were detected with and without MI treatment, respectively. The detected genes included 184 new genes that were not annotated in the Arabidopsis thaliana reference genome. Additionally, 183 differentially expressed genes were identified (DEGs, FDR ≤0.05, log2 FC≥1), including 93 up-regulated genes and 90 down-regulated genes. The DEGs were involved in multiple pathways, such as cell wall biosynthesis, biotic and abiotic stress response, chromosome modification, and substrate transportation. Some significantly differently expressed genes provided us with valuable information for exploring the functions of exogenous MI. RNA-Seq results showed that exogenous MI could alter gene expression and signaling transduction in plant cells. These results provided a systematic understanding of the functions of exogenous MI in detail and provided a foundation for future studies. PMID:27603208
Modification of N6-methyladenosine RNA methylation on heat shock protein expression.
Yu, Jiayao; Li, Yi; Wang, Tian; Zhong, Xiang
2018-01-01
This study was conducted to investigate effect of N6-methyladenosine (m6A) RNA methylation on Heat shock proteins (HSPs) and dissect the profile of HSP RNA methylation. The results showed that m6A methyltransferases METTL3 mRNA was decreased in responses to heat shock stress in HepG2 cells, but m6A-specific binding protein YTHDF2 mRNA was upregulated in a manner similar to HSP70 induction. Immunofluorescence staining showed that the majority of YTHDF2 was present in the cytosol, however, nearly all YTHDF2 translocated from the cytosol into the nucleus after heat shock. METTL3 knockdown significantly changed HSP70, HSP60, and HSP27 mRNA expression in HepG2 cells using siRNA, however, mRNA lifetime was not impacted. Silence of YTHDF2 using siRNA did not change expression of HSP70, but significantly increased HSP90, HSP60, and HSPB1 mRNA expression. In addition, m6A-seq revealed that HSP m6A methylation peaks are mainly enriched on exons and around stop codons, and shows a unique distribution profile in the 5'UTR and 3'UTR. Knockdown of METTL3 changed the methylation patterns of HSPs transcript. In conclusion, m6A RNA methylation regulates HSP gene expression. Differential expression of HSPs modulated by m6A may depend on the m6A site and abundance of the target gene. This finding provides insights into new regulatory mechanisms of HSPs in normal and stress situations.
Mascarenhas, Roshan; Pietrzak, Maciej; Smith, Ryan M; Webb, Amy; Wang, Danxin; Papp, Audrey C; Pinsonneault, Julia K; Seweryn, Michal; Rempala, Grzegorz; Sadee, Wolfgang
2015-01-01
mRNA translation into proteins is highly regulated, but the role of mRNA isoforms, noncoding RNAs (ncRNAs), and genetic variants remains poorly understood. mRNA levels on polysomes have been shown to correlate well with expressed protein levels, pointing to polysomal loading as a critical factor. To study regulation and genetic factors of protein translation we measured levels and allelic ratios of mRNAs and ncRNAs (including microRNAs) in lymphoblast cell lines (LCL) and in polysomal fractions. We first used targeted assays to measure polysomal loading of mRNA alleles, confirming reported genetic effects on translation of OPRM1 and NAT1, and detecting no effect of rs1045642 (3435C>T) in ABCB1 (MDR1) on polysomal loading while supporting previous results showing increased mRNA turnover of the 3435T allele. Use of high-throughput sequencing of complete transcript profiles (RNA-Seq) in three LCLs revealed significant differences in polysomal loading of individual RNA classes and isoforms. Correlated polysomal distribution between protein-coding and non-coding RNAs suggests interactions between them. Allele-selective polysome recruitment revealed strong genetic influence for multiple RNAs, attributable either to differential expression of RNA isoforms or to differential loading onto polysomes, the latter defining a direct genetic effect on translation. Genes identified by different allelic RNA ratios between cytosol and polysomes were enriched with published expression quantitative trait loci (eQTLs) affecting RNA functions, and associations with clinical phenotypes. Polysomal RNA-Seq combined with allelic ratio analysis provides a powerful approach to study polysomal RNA recruitment and regulatory variants affecting protein translation.
Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao
2017-01-01
Purpose Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. Methods The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. Results At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes were validated to show similar results as the RNA-seq. Conclusions We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes occurred concurrently in EIU. These genes may play an important role in the pathogenesis of EIU. This study will lead to a better understanding of the underlying mechanisms and shed light on discovering novel therapeutic targets for ocular inflammation. PMID:28706439
Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao; Lei, Bo
2017-01-01
Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes were validated to show similar results as the RNA-seq. We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes occurred concurrently in EIU. These genes may play an important role in the pathogenesis of EIU. This study will lead to a better understanding of the underlying mechanisms and shed light on discovering novel therapeutic targets for ocular inflammation.
USDA-ARS?s Scientific Manuscript database
About 447 millions of RNA-Seq sequences were generated from 40 RNA libraries covering 8 different berry developmental stages of table grape ‘Kyoho’ and its early ripening bud mutant ‘Fengzao’. These sequences were mapped to 23,178 and 22,982 genes in the flesh and peel tissues, respectively. While m...
The power and promise of RNA-seq in ecology and evolution.
Todd, Erica V; Black, Michael A; Gemmell, Neil J
2016-03-01
Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.
Wang, Yejun; MacKenzie, Keith D; White, Aaron P
2015-05-07
As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for comparative analyses with other Salmonella serotypes.
Cross-platform normalization of microarray and RNA-seq data for machine learning applications
Thompson, Jeffrey A.; Tan, Jie
2016-01-01
Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019
Todd, Antonette R; Donofrio, Nicole; Sripathi, Venkateswara R; McClean, Phillip E; Lee, Rian K; Pastor-Corrales, Marcial; Kalavacharla, Venu Kal
2017-05-23
Common bean ( Phaseolus vulgaris L.) is an important legume, useful for its high protein and dietary fiber. The fungal pathogen Uromyces appendiculatus (Pers.) Unger can cause major loss in susceptible varieties of the common bean. The Ur-3 locus provides race specific resistance to virulent strains or races of the bean rust pathogen along with Crg , (Complements resistance gene), which is required for Ur-3 -mediated rust resistance. In this study, we inoculated two common bean genotypes (resistant "Sierra" and susceptible crg) with rust race 53 of U. appendiculatus , isolated leaf RNA at specific time points, and sequenced their transcriptomes. First, molecular markers were used to locate and identify a 250 kb deletion on chromosome 10 in mutant crg (which carries a deletion at the Crg locus). Next, we identified differential expression of several disease resistance genes between Mock Inoculated (MI) and Inoculated (I) samples of "Sierra" leaf RNA within the 250 kb delineated region. Both marker assisted molecular profiling and RNA-seq were used to identify possible transcriptomic locations of interest regarding the resistance in the common bean to race 53. Identification of differential expression among samples in disease resistance clusters in the bean genome may elucidate significant genes underlying rust resistance. Along with preserving favorable traits in the crop, the current research may also aid in global sustainability of food stocks necessary for many populations.
Todd, Antonette R.; Donofrio, Nicole; Sripathi, Venkateswara R.; McClean, Phillip E.; Lee, Rian K.; Pastor-Corrales, Marcial; Kalavacharla, Venu (Kal)
2017-01-01
Common bean (Phaseolus vulgaris L.) is an important legume, useful for its high protein and dietary fiber. The fungal pathogen Uromyces appendiculatus (Pers.) Unger can cause major loss in susceptible varieties of the common bean. The Ur-3 locus provides race specific resistance to virulent strains or races of the bean rust pathogen along with Crg, (Complements resistance gene), which is required for Ur-3-mediated rust resistance. In this study, we inoculated two common bean genotypes (resistant “Sierra” and susceptible crg) with rust race 53 of U. appendiculatus, isolated leaf RNA at specific time points, and sequenced their transcriptomes. First, molecular markers were used to locate and identify a 250 kb deletion on chromosome 10 in mutant crg (which carries a deletion at the Crg locus). Next, we identified differential expression of several disease resistance genes between Mock Inoculated (MI) and Inoculated (I) samples of “Sierra” leaf RNA within the 250 kb delineated region. Both marker assisted molecular profiling and RNA-seq were used to identify possible transcriptomic locations of interest regarding the resistance in the common bean to race 53. Identification of differential expression among samples in disease resistance clusters in the bean genome may elucidate significant genes underlying rust resistance. Along with preserving favorable traits in the crop, the current research may also aid in global sustainability of food stocks necessary for many populations. PMID:28545258
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Beta-Poisson model for single-cell RNA-seq data analyses.
Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Rantalainen, Mattias; Pawitan, Yudi
2016-07-15
Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Eicher, John D.; Wakabayashi, Yoshiyuki; Vitseva, Olga; Esa, Nada; Yang, Yanqin; Zhu, Jun; Freedman, Jane E.; McManus, David D.; Johnson, Andrew D.
2016-01-01
Transcripts in platelets are largely produced in precursor megakaryocytes but remain physiologically-active as platelets translate RNAs and regulate protein/RNA levels. Recent studies using transcriptome sequencing (RNA-seq) characterized the platelet transcriptome in limited numbers of non-diseased individuals. Here, we expand upon these RNA-seq studies by completing RNA-seq in platelets from 32 patients with acute myocardial infarction (MI). Our goals were to characterize the platelet transcriptome using a population of patients with acute MI and relate gene expression to platelet aggregation measures and ST-segment elevation MI (STEMI) (n=16) versus non-STEMI (NSTEMI) (n=16) subtypes. Similar to other studies, we detected 9,565 expressed transcripts, including several known platelet-enriched markers (e.g., PPBP, OST4). Our RNA-seq data strongly correlated with independently ascertained platelet expression data and showed enrichment for platelet-related pathways (e.g., wound response, hemostasis, and platelet activation), as well as actin-related and post-transcriptional processes. Several transcripts displayed suggestively higher (FBXL4, ECHDC3, KCNE1, TAOK2, AURKB, ERG, and FKBP5) and lower (MIAT, PVRL3and PZP) expression in STEMI platelets compared to NSTEMI. We also identified transcripts correlated with platelet aggregation to TRAP (ATP6V1G2, SLC2A3), collagen (CEACAM1, ITGA2), and ADP (PDGFB, PDGFC, ST3GAL6). Our study adds to current platelet gene expression resources by providing transcriptome-wide analyses in platelets isolated from patients with acute MI. In concert with prior studies, we identify various genes for further study in regards to platelet function and acute MI. Future platelet RNA-seq studies examining more diverse sets of healthy and diseased samples will add to our understanding of platelet thrombotic and non-thrombotic functions. PMID:26367242
High-throughput detection of RNA processing in bacteria.
Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L
2018-03-27
Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...
2015-10-28
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai
We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less
Daveau, Romain; Combaret, Valérie; Pierre-Eugène, Cécile; Cazes, Alex; Louis-Brennetot, Caroline; Schleiermacher, Gudrun; Ferrand, Sandrine; Pierron, Gaëlle; Lermine, Alban; Frio, Thomas Rio; Raynal, Virginie; Vassal, Gilles; Barillot, Emmanuel; Delattre, Olivier; Janoueix-Lerosey, Isabelle
2013-01-01
Neuroblastoma is a pediatric cancer of the peripheral nervous system in which structural chromosome aberrations are emblematic of aggressive tumors. In this study, we performed an in-depth analysis of somatic rearrangements in two neuroblastoma cell lines and two primary tumors using paired-end sequencing of mate-pair libraries and RNA-seq. The cell lines presented with typical genetic alterations of neuroblastoma and the two tumors belong to the group of neuroblastoma exhibiting a profile of chromothripsis. Inter and intra-chromosomal rearrangements were identified in the four samples, allowing in particular characterization of unbalanced translocations at high resolution. Using complementary experiments, we further characterized 51 rearrangements at the base pair resolution that revealed 59 DNA junctions. In a subset of cases, complex rearrangements were observed with templated insertion of fragments of nearby sequences. Although we did not identify known particular motifs in the local environment of the breakpoints, we documented frequent microhomologies at the junctions in both chromothripsis and non-chromothripsis associated breakpoints. RNA-seq experiments confirmed expression of several predicted chimeric genes and genes with disrupted exon structure including ALK, NBAS, FHIT, PTPRD and ODZ4. Our study therefore indicates that both non-homologous end joining-mediated repair and replicative processes may account for genomic rearrangements in neuroblastoma. RNA-seq analysis allows the identification of the subset of abnormal transcripts expressed from genomic rearrangements that may be involved in neuroblastoma oncogenesis. PMID:23991058
Singh, Anil Kumar; Sharma, Vishal; Pal, Awadhesh Kumar; Acharya, Vishal; Ahuja, Paramvir Singh
2013-08-01
NAC [no apical meristem (NAM), Arabidopsis thaliana transcription activation factor [ATAF1/2] and cup-shaped cotyledon (CUC2)] proteins belong to one of the largest plant-specific transcription factor (TF) families and play important roles in plant development processes, response to biotic and abiotic cues and hormone signalling. Our genome-wide analysis identified 110 StNAC genes in potato encoding for 136 proteins, including 14 membrane-bound TFs. The physical map positions of StNAC genes on 12 potato chromosomes were non-random, and 40 genes were found to be distributed in 16 clusters. The StNAC proteins were phylogenetically clustered into 12 subgroups. Phylogenetic analysis of StNACs along with their Arabidopsis and rice counterparts divided these proteins into 18 subgroups. Our comparative analysis has also identified 36 putative TNAC proteins, which appear to be restricted to Solanaceae family. In silico expression analysis, using Illumina RNA-seq transcriptome data, revealed tissue-specific, biotic, abiotic stress and hormone-responsive expression profile of StNAC genes. Several StNAC genes, including StNAC072 and StNAC101that are orthologs of known stress-responsive Arabidopsis RESPONSIVE TO DEHYDRATION 26 (RD26) were identified as highly abiotic stress responsive. Quantitative real-time polymerase chain reaction analysis largely corroborated the expression profile of StNAC genes as revealed by the RNA-seq data. Taken together, this analysis indicates towards putative functions of several StNAC TFs, which will provide blue-print for their functional characterization and utilization in potato improvement.
Cui, Yi; Han, Jin; Xiao, Zhifeng; Qi, Yiduo; Zhao, Yannan; Chen, Bing; Fang, Yongxiang; Liu, Sumei; Wu, Xianming; Dai, Jianwu
2017-01-01
Recently, with the development of the space program there are growing concerns about the influence of spaceflight on tissue engineering. The purpose of this study was thus to determine the variations of neural stem cells (NSCs) during spaceflight. RNA-Sequencing (RNA-Seq) based transcriptomic profiling of NSCs identified many differentially expressed mRNAs and miRNAs between space and earth groups. Subsequently, those genes with differential expression were subjected to bioinformatic evaluation using gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) and miRNA-mRNA network analyses. The results showed that NSCs maintain greater stemness ability during spaceflight although the growth rate of NSCs was slowed down. Furthermore, the results indicated that NSCs tended to differentiate into neuron in outer space conditions. Detailed genomic analyses of NSCs during spaceflight will help us to elucidate the molecular mechanisms behind their differentiation and proliferation when they are in outer space.
Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph
2016-01-01
Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.
ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis
2011-01-01
Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108
Gene expression analysis of induced pluripotent stem cells from aneuploid chromosomal syndromes
2013-01-01
Background Human aneuploidy is the leading cause of early pregnancy loss, mental retardation, and multiple congenital anomalies. Due to the high mortality associated with aneuploidy, the pathophysiological mechanisms of aneuploidy syndrome remain largely unknown. Previous studies focused mostly on whether dosage compensation occurs, and the next generation transcriptomics sequencing technology RNA-seq is expected to eventually uncover the mechanisms of gene expression regulation and the related pathological phenotypes in human aneuploidy. Results Using next generation transcriptomics sequencing technology RNA-seq, we profiled the transcriptomes of four human aneuploid induced pluripotent stem cell (iPSC) lines generated from monosomy × (Turner syndrome), trisomy 8 (Warkany syndrome 2), trisomy 13 (Patau syndrome), and partial trisomy 11:22 (Emanuel syndrome) as well as two umbilical cord matrix iPSC lines as euploid controls to examine how phenotypic abnormalities develop with aberrant karyotype. A total of 466 M (50-bp) reads were obtained from the six iPSC lines, and over 13,000 mRNAs were identified by gene annotation. Global analysis of gene expression profiles and functional analysis of differentially expressed (DE) genes were implemented. Over 5000 DE genes are determined between aneuploidy and euploid iPSCs respectively while 9 KEGG pathways are overlapped enriched in four aneuploidy samples. Conclusions Our results demonstrate that the extra or missing chromosome has extensive effects on the whole transcriptome. Functional analysis of differentially expressed genes reveals that the genes most affected in aneuploid individuals are related to central nervous system development and tumorigenesis. PMID:24564826
Tannir, Nizar M.; Williams, Michelle D.; Chen, Yunxin; Yao, Hui; Zhang, Jianping; Thompson, Erika J.; Meric-Bernstam, Funda; Medeiros, L. Jeffrey; Weinstein, John N.
2013-01-01
Elucidation of tumor-DNA virus associations in many cancer types has enhanced our knowledge of fundamental oncogenesis mechanisms and provided a basis for cancer prevention initiatives. RNA-Seq is a novel tool to comprehensively assess such associations. We interrogated RNA-Seq data from 3,775 malignant neoplasms in The Cancer Genome Atlas database for the presence of viral sequences. Viral integration sites were also detected in expressed transcripts using a novel approach. The detection capacity of RNA-Seq was compared to available clinical laboratory data. Human papillomavirus (HPV) transcripts were detected using RNA-Seq analysis in head-and-neck squamous cell carcinoma, uterine endometrioid carcinoma, and squamous cell carcinoma of the lung. Detection of HPV by RNA-Seq correlated with detection by in situ hybridization and immunohistochemistry in squamous cell carcinoma tumors of the head and neck. Hepatitis B virus and Epstein-Barr virus (EBV) were detected using RNA-Seq in hepatocellular carcinoma and gastric carcinoma tumors, respectively. Integration sites of viral genes and oncogenes were detected in cancers harboring HPV or hepatitis B virus but not in EBV-positive gastric carcinoma. Integration sites of expressed viral transcripts frequently involved known coding areas of the host genome. No DNA virus transcripts were detected in acute myeloid leukemia, cutaneous melanoma, low- and high-grade gliomas of the brain, and adenocarcinomas of the breast, colon and rectum, lung, prostate, ovary, kidney, and thyroid. In conclusion, this study provides a large-scale overview of the landscape of DNA viruses in human malignant cancers. While further validation is necessary for specific cancer types, our findings highlight the utility of RNA-Seq in detecting tumor-associated DNA viruses and identifying viral integration sites that may unravel novel mechanisms of cancer pathogenesis. PMID:23740984
Lattimore, Vanessa L.; Pearson, John F.; Currie, Margaret J.; Spurdle, Amanda B.; Robinson, Bridget A.; Walker, Logan C.
2018-01-01
PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2. The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates (n > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance. PMID:29774201
Lattimore, Vanessa L; Pearson, John F; Currie, Margaret J; Spurdle, Amanda B; Robinson, Bridget A; Walker, Logan C
2018-01-01
PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2 . The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates ( n > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance.
Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R
2015-01-01
Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.
Wei, Guanyun; Sun, Lianjie; Li, Ruimin; Li, Lei; Xu, Jiao; Ma, Fei
2018-04-01
Pathogen bacteria infections can lead to dynamic changes of microRNA (miRNA) and mRNA expression profiles, which may control synergistically the outcome of immune responses. To reveal the role of dynamic miRNA-mRNA regulation in Drosophila innate immune responses, we have detailedly analyzed the paired miRNA and mRNA expression profiles at three time points during Drosophila adult males with Micrococcus luteus (M. luteus) infection using RNA- and small RNA-seq data. Our results demonstrate that differentially expressed miRNAs and mRNAs represent extensively dynamic changes over three time points during Drosophila with M. luteus infection. The pathway enrichment analysis indicates that differentially expressed genes are involved in diverse signaling pathways, including Toll and Imd as well as orther signaling pathways at three time points during Drosophila with M. luteus infection. Remarkably, the dynamic change of miRNA expression is delayed by compared to mRNA expression change over three time points, implying that the "time" parameter should be considered when the function of miRNA/mRNA is further studied. In particular, the dynamic miRNA-mRNA regulatory networks have shown that miRNAs may synergistically regulate gene expressions of different signaling pathways to promote or inhibit innate immune responses and maintain homeostasis in Drosophila, and some new regulators involved in Drosophila innate immune response have been identified. Our findings strongly suggest that miRNA regulation is a key mechanism involved in fine-tuning cooperatively gene expressions of diverse signaling pathways to maintain innate immune response and homeostasis in Drosophila. Taken together, the present study reveals a novel role of dynamic miRNA-mRNA regulation in immune response to bacteria infection, and provides a new insight into the underlying molecular regulatory mechanism of Drosophila innate immune responses. Copyright © 2017 Elsevier Ltd. All rights reserved.
RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application
2015-01-01
Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471
Hemphill, D D; McIlwraith, C W; Slayden, R A; Samulski, R J; Goodrich, L R
2016-05-01
IGF-I is one of several anabolic factors being investigated for the treatment of osteoarthritis (OA). Due to the short biological half-life, extended administration is required for more robust cartilage healing. Here we create a self-complimentary adeno-associated virus (AAV) gene therapy vector utilizing the transgene for IGF-I. Various biochemical assays were performed to investigate the cellular response to scAAVIGF-I treatment vs an scAAVGFP positive transduction control and a negative for transduction control culture. RNA-sequencing analysis was also performed to establish a differential regulation profile of scAAVIGF-I transduced chondrocytes. Biochemical analyses indicated an average media IGF-I concentration of 608 ng/ml in the scAAVIGF-I transduced chondrocytes. This increase in IGF-I led to increased expression of collagen type II and aggrecan and increased protein concentrations of cellular collagen type II and media glycosaminoglycan vs both controls. RNA-seq revealed a global regulatory pattern consisting of 113 differentially regulated GO categories including those for chondrocyte and cartilage development and regulation of apoptosis. This research substantiates that scAAVIGF-I gene therapy vector increased production of IGF-I to clinically relevant levels with a biological response by chondrocytes conducive to increased cartilage healing. The RNA-seq further established a set of differentially expressed genes and gene ontologies induced by the scAAVIGF-I vector while controlling for AAV infection. This dataset provides a static representation of the cellular transcriptome that, while only consisting of one time point, will allow for further gene expression analyses to compare additional cartilage healing therapeutics or a transient cellular response. Copyright © 2015. Published by Elsevier Ltd.
RNA-seq Analysis of Early Hepatic Response to Handling and Confinement Stress in Rainbow Trout
Liu, Sixin; Gao, Guangtu; Palti, Yniv; Cleveland, Beth M.; Weber, Gregory M.; Rexroad, Caird E.
2014-01-01
Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs). Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R2 = 0.88). Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database. Accession Numbers Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881. Customized Perl Scripts All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author. PMID:24558395
OP17MICRORNA PROFILING USING SMALL RNA-SEQ IN PAEDIATRIC LOW GRADE GLIOMAS
Jeyapalan, Jennie N.; Jones, Tania A.; Tatevossian, Ruth G.; Qaddoumi, Ibrahim; Ellison, David W.; Sheer, Denise
2014-01-01
INTRODUCTION: MicroRNAs regulate gene expression by targeting mRNAs for translational repression or degradation at the post-transcriptional level. In paediatric low-grade gliomas a few key genetic mutations have been identified, including BRAF fusions, FGFR1 duplications and MYB rearrangements. Our aim in the current study is to profile aberrant microRNA expression in paediatric low-grade gliomas and determine the role of epigenetic changes in the aetiology and behaviour of these tumours. METHOD: MicroRNA profiling of tumour samples (6 pilocytic, 2 diffuse, 2 pilomyxoid astrocytomas) and normal brain controls (4 adult normal brain samples and a primary glial progenitor cell-line) was performed using small RNA sequencing. Bioinformatic analysis included sequence alignment, analysis of the number of reads (CPM, counts per million) and differential expression. RESULTS: Sequence alignment identified 695 microRNAs, whose expression was compared in tumours v. normal brain. PCA and hierarchical clustering showed separate groups for tumours and normal brain. Computational analysis identified approximately 400 differentially expressed microRNAs in the tumours compared to matched location controls. Our findings will then be validated and integrated with extensive genetic and epigenetic information we have previously obtained for the full tumour cohort. CONCLUSION: We have identified microRNAs that are differentially expressed in paediatric low-grade gliomas. As microRNAs are known to target genes involved in the initiation and progression of cancer, they provide critical information on tumour pathogenesis and are an important class of biomarkers.
Berardocco, Martina; Radeghieri, Annalisa; Busatto, Sara; Gallorini, Marialucia; Raggi, Chiara; Gissi, Clarissa; D’Agnano, Igea; Bergese, Paolo; Felsani, Armando; Berardi, Anna C.
2017-01-01
Liver cancer (LC) is one of the most common cancers and represents the third highest cause of cancer-related deaths worldwide. Extracellular vesicle (EVs) cargoes, which are selectively enriched in RNA, offer great promise for the diagnosis, prognosis and treatment of LC. Our study analyzed the RNA cargoes of EVs derived from 4 liver-cancer cell lines: HuH7, Hep3B, HepG2 (hepato-cellular carcinoma) and HuH6 (hepatoblastoma), generating two different sets of sequencing libraries for each. One library was size-selected for small RNAs and the other targeted the whole transcriptome. Here are reported genome wide data of the expression level of coding and non-coding transcripts, microRNAs, isomiRs and snoRNAs providing the first comprehensive overview of the extracellular-vesicle RNA cargo released from LC cell lines. The EV-RNA expression profiles of the four liver cancer cell lines share a similar background, but cell-specific features clearly emerge showing the marked heterogeneity of the EV-cargo among the individual cell lines, evident both for the coding and non-coding RNA species. PMID:29137313
Yu, Ying; Wu, Guangwen; Yuan, Hongmei; Cheng, Lili; Zhao, Dongsheng; Huang, Wengong; Zhang, Shuquan; Zhang, Liguo; Chen, Hongyu; Zhang, Jian; Guan, Fengzhi
2016-05-27
MicroRNAs (miRNAs) play a critical role in responses to biotic and abiotic stress and have been characterized in a large number of plant species. Although flax (Linum usitatissimum L.) is one of the most important fiber and oil crops worldwide, no reports have been published describing flax miRNAs (Lus-miRNAs) induced in response to saline, alkaline, and saline-alkaline stresses. In this work, combined small RNA and degradome deep sequencing was used to analyze flax libraries constructed after alkaline-salt stress (AS2), neutral salt stress (NSS), alkaline stress (AS), and the non-stressed control (CK). From the CK, AS, AS2, and NSS libraries, a total of 118, 119, 122, and 120 known Lus-miRNAs and 233, 213, 211, and 212 novel Lus-miRNAs were isolated, respectively. After assessment of differential expression profiles, 17 known Lus-miRNAs and 36 novel Lus-miRNAs were selected and used to predict putative target genes. Gene ontology term enrichment analysis revealed target genes that were involved in responses to stimuli, including signaling and catalytic activity. Eight Lus-miRNAs were selected for analysis using qRT-PCR to confirm the accuracy and reliability of the miRNA-seq results. The qRT-PCR results showed that changes in stress-induced expression profiles of these miRNAs mirrored expression trends observed using miRNA-seq. Degradome sequencing and transcriptome profiling showed that expression of 29 miRNA-target pairs displayed inverse expression patterns under saline, alkaline, and saline-alkaline stresses. From the target prediction analysis, the miR398a-targeted gene codes for a copper/zinc superoxide dismutase, and the miR530 has been shown to explicitly target WRKY family transcription factors, which suggesting that these two micRNAs and their targets may significant involve in the saline, alkaline, and saline-alkaline stress response in flax. Identification and characterization of flax miRNAs, their target genes, functional annotations, and gene expression patterns are reported in this work. These findings will enhance our understanding of flax miRNA regulatory mechanisms under saline, alkaline, and saline-alkaline stresses and provide a foundation for future elucidation of the specific functions of these miRNAs.
Li, X-y; Yao, X; Li, S-n; Suo, A-l; Ruan, Z-p; Liang, X; Kong, Y; Zhang, W-g; Yao, Y
2014-01-01
Multiple genetic alterations that affect the process of acute myeloid leukemia (AML) have been discovered, and more evidence also indicates that aberrant splicing plays an important role in cancer. We present a RNA-Seq profiling of an AML patient with complete remission after treatment, to analyze the aberrant splicing of genes during treatment. We sequenced 3.97 and 3.32 Gbp clean data of the AML and remission sample, respectively. Firstly, by analyzing biomarkers associated with AML, to assist normal clinical tests, we confirmed that the patient was anormal karyo type, with NPM1 and IDH2 mutations and deregulation patterns of related genes, such as BAALC, ERG, MN1 and HOX family. Then, we performed alternative splicing detection of the AML and remission sample. We detected 91 differentially splicing events in 68 differentially splicing genes (DSGs) by mixture of isoforms (MISO). Considering Psi values (Ψ) and confidence intervals, 25 differentially expressed isoforms were identified as more confident isoforms, which were associated with RNA processing, cellular macromolecule catabolic process and DNA binding according to GO enrichment analysis. An exon2-skipping event in oncogene FOS (FBJ murine osteosarcoma viral oncogene homolog) were detected and validated in this study. FOS has a critical function in regulating cell proliferation, differentiation and transformation. The exon2-skipping isoform of FOS was increased significantly after treatment. All the data and information of RNA-Seq provides highly accurate and comprehensive supplements to conventional clinical tests of AML. Moreover, the splicing aberrations would be another source for biomarker and even therapeutic target discovery. More information of splicing may also assist the better understanding of leukemogenesis.
Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter
2017-01-01
Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586
RISC RNA sequencing for context-specific identification of in vivo microRNA targets.
Matkovich, Scot J; Van Booven, Derek J; Eschenbacher, William H; Dorn, Gerald W
2011-01-07
MicroRNAs (miRs) are expanding our understanding of cardiac disease and have the potential to transform cardiovascular therapeutics. One miR can target hundreds of individual mRNAs, but existing methodologies are not sufficient to accurately and comprehensively identify these mRNA targets in vivo. To develop methods permitting identification of in vivo miR targets in an unbiased manner, using massively parallel sequencing of mouse cardiac transcriptomes in combination with sequencing of mRNA associated with mouse cardiac RNA-induced silencing complexes (RISCs). We optimized techniques for expression profiling small amounts of RNA without introducing amplification bias and applied this to anti-Argonaute 2 immunoprecipitated RISCs (RISC-Seq) from mouse hearts. By comparing RNA-sequencing results of cardiac RISC and transcriptome from the same individual hearts, we defined 1645 mRNAs consistently targeted to mouse cardiac RISCs. We used this approach in hearts overexpressing miRs from Myh6 promoter-driven precursors (programmed RISC-Seq) to identify 209 in vivo targets of miR-133a and 81 in vivo targets of miR-499. Consistent with the fact that miR-133a and miR-499 have widely differing "seed" sequences and belong to different miR families, only 6 targets were common to miR-133a- and miR-499-programmed hearts. RISC-sequencing is a highly sensitive method for general RISC profiling and individual miR target identification in biological context and is applicable to any tissue and any disease state.
Sharma, Davinder; Golla, Naresh; Singh, Dheer; Onteru, Suneel K
2018-03-01
The next-generation sequencing (NGS) based RNA sequencing (RNA-Seq) and transcriptome profiling offers an opportunity to unveil complex biological processes. Successful RNA-Seq and transcriptome profiling requires a large amount of high-quality RNA. However, NGS-quality RNA isolation is extremely difficult from recalcitrant adipose tissue (AT) with high lipid content and low cell numbers. Further, the amount and biochemical composition of AT lipid varies depending upon the animal species which can pose different degree of resistance to RNA extraction. Currently available approaches may work effectively in one species but can be almost unproductive in another species. Herein, we report a two step protocol for the extraction of NGS quality RNA from AT across a broad range of animal species. © 2017 Wiley Periodicals, Inc.
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
Butler, Andrew; Hoffman, Paul; Smibert, Peter; Papalexi, Efthymia; Satija, Rahul
2018-06-01
Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
Lewallen, Eric A.; Bonin, Carolina A.; Li, Xin; Smith, Jay; Karperien, Marcel; Larson, A. Noelle; Lewallen, David G.; Cool, Simon M.; Westendorf, Jennifer J.; Krych, Aaron J.; Leontovich, Alexey A.; Im, Hee-Jeong; van Wijnen, Andre J.
2018-01-01
Osteoarthritis (OA) is a disabling degenerative joint disease that prompts pain with limited treatment options. To permit early diagnosis and treatment of OA, a high resolution mechanistic understanding of human chondrocytes in normal and diseased states is necessary. In this study, we assessed the biological effects of OA-related changes in the synovial microenvironment on chondrocytes embedded within anatomically intact cartilage from joints with different pathological grades by next generation RNA-sequencing (RNA-seq). We determined the transcriptome of primary articular chondrocytes derived from pristine knees and ankles, as well as from joints affected by OA. The GALAXY bioinformatics platform was used to facilitate biological interpretations. Comparisons of patient samples by k-means, hierarchical clustering and principal component analysis reveal that primary chondrocytes exhibit OA grade-related differences in gene expression, including genes involved in cell-adhesion, ECM production and immune response. We conclude that diseased synovial microenvironments in joints with different histopathological OA grades directly alter gene expression in chondrocytes. One ramification of this finding is that sampling anatomically intact cartilage from OA joints is not an ideal source of healthy chondrocytes, nor should they be used to generate a normal baseline for the molecular characterization of diseased joints. PMID:27378743
Whole transcriptome profiling of taste bud cells.
Sukumaran, Sunil K; Lewandowski, Brian C; Qin, Yumei; Kotha, Ramana; Bachmanov, Alexander A; Margolskee, Robert F
2017-08-08
Analysis of single-cell RNA-Seq data can provide insights into the specific functions of individual cell types that compose complex tissues. Here, we examined gene expression in two distinct subpopulations of mouse taste cells: Tas1r3-expressing type II cells and physiologically identified type III cells. Our RNA-Seq libraries met high quality control standards and accurately captured differential expression of marker genes for type II (e.g. the Tas1r genes, Plcb2, Trpm5) and type III (e.g. Pkd2l1, Ncam, Snap25) taste cells. Bioinformatics analysis showed that genes regulating responses to stimuli were up-regulated in type II cells, while pathways related to neuronal function were up-regulated in type III cells. We also identified highly expressed genes and pathways associated with chemotaxis and axon guidance, providing new insights into the mechanisms underlying integration of new taste cells into the taste bud. We validated our results by immunohistochemically confirming expression of selected genes encoding synaptic (Cplx2 and Pclo) and semaphorin signalling pathway (Crmp2, PlexinB1, Fes and Sema4a) components. The approach described here could provide a comprehensive map of gene expression for all taste cell subpopulations and will be particularly relevant for cell types in taste buds and other tissues that can be identified only by physiological methods.
Comparative Transcriptomic Analyses of Vegetable and Grain Pea (Pisum sativum L.) Seed Development
Liu, Na; Zhang, Guwen; Xu, Shengchun; Mao, Weihua; Hu, Qizan; Gong, Yaming
2015-01-01
Understanding the molecular mechanisms regulating pea seed developmental process is extremely important for pea breeding. In this study, we used high-throughput RNA-Seq and bioinformatics analyses to examine the changes in gene expression during seed development in vegetable pea and grain pea, and compare the gene expression profiles of these two pea types. RNA-Seq generated 18.7 G of raw data, which were then de novo assembled into 77,273 unigenes with a mean length of 930 bp. Our results illustrate that transcriptional control during pea seed development is a highly coordinated process. There were 459 and 801 genes differentially expressed at early and late seed maturation stages between vegetable pea and grain pea, respectively. Soluble sugar and starch metabolism related genes were significantly activated during the development of pea seeds coinciding with the onset of accumulation of sugar and starch in the seeds. A comparative analysis of genes involved in sugar and starch biosynthesis in vegetable pea (high seed soluble sugar and low starch) and grain pea (high seed starch and low soluble sugar) revealed that differential expression of related genes at late development stages results in a negative correlation between soluble sugar and starch biosynthetic flux in vegetable and grain pea seeds. RNA-Seq data was validated by using real-time quantitative RT-PCR analysis for 30 randomly selected genes. To our knowledge, this work represents the first report of seed development transcriptomics in pea. The obtained results provide a foundation to support future efforts to unravel the underlying mechanisms that control the developmental biology of pea seeds, and serve as a valuable resource for improving pea breeding. PMID:26635856
Esteve-Codina, Anna; Arpi, Oriol; Martinez-García, Maria; Pineda, Estela; Mallo, Mar; Gut, Marta; Carrato, Cristina; Rovira, Anna; Lopez, Raquel; Tortosa, Avelina; Dabad, Marc; Del Barco, Sonia; Heath, Simon; Bagué, Silvia; Ribalta, Teresa; Alameda, Francesc; de la Iglesia, Nuria
2017-01-01
The molecular classification of glioblastoma (GBM) based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF) tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE) tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved. PMID:28122052
Use of archival resources has been limited to date by inconsistent methods for genomic profiling of degraded RNA from formalin-fixed paraffin-embedded (FFPE) samples. RNA-sequencing offers a promising way to address this problem. Here we evaluated transcriptomic dose responses us...
Chen, Jian; Lin, Mingyan; Foxe, John J; Pedrosa, Erika; Hrabovsky, Anastasia; Carroll, Reed; Zheng, Deyou; Lachman, Herbert M
2013-01-01
Induced pluripotent stem cell (iPSC) technology is providing an opportunity to study neuropsychiatric disorders through the capacity to grow patient-specific neurons in vitro. Skin fibroblasts obtained by biopsy have been the most reliable source of cells for reprogramming. However, using other somatic cells obtained by less invasive means would be ideal, especially in children with autism spectrum disorders (ASD) and other neurodevelopmental conditions. In addition to fibroblasts, iPSCs have been developed from cord blood, lymphocytes, hair keratinocytes, and dental pulp from deciduous teeth. Of these, dental pulp would be a good source for neurodevelopmental disorders in children because obtaining material is non-invasive. We investigated its suitability for disease modeling by carrying out gene expression profiling, using RNA-seq, on differentiated neurons derived from iPSCs made from dental pulp extracted from deciduous teeth (T-iPSCs) and fibroblasts (F-iPSCs). This is the first RNA-seq analysis comparing gene expression profiles in neurons derived from iPSCs made from different somatic cells. For the most part, gene expression profiles were quite similar with only 329 genes showing differential expression at a nominally significant p-value (p<0.05), of which 63 remained significant after correcting for genome-wide analysis (FDR <0.05). The most striking difference was the lower level of expression detected for numerous members of the all four HOX gene families in neurons derived from T-iPSCs. In addition, an increased level of expression was seen for several transcription factors expressed in the developing forebrain (FOXP2, OTX1, and LHX2, for example). Overall, pathway analysis revealed that differentially expressed genes that showed higher levels of expression in neurons derived from T-iPSCs were enriched for genes implicated in schizophrenia (SZ). The findings suggest that neurons derived from T-iPSCs are suitable for disease-modeling neuropsychiatric disorder and may have some advantages over those derived from F-iPSCs.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omasits, U.; Quebatte, Maxime; Stekhoven, Daniel J.
2013-11-01
Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, wemore » could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ~90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.« less
Omasits, Ulrich; Quebatte, Maxime; Stekhoven, Daniel J.; Fortes, Claudia; Roschitzki, Bernd; Robinson, Mark D.; Dehio, Christoph; Ahrens, Christian H.
2013-01-01
Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor. PMID:23878158
Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney
2012-01-01
RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Transcriptomic Analysis of Paulownia Infected by Paulownia Witches'-Broom Phytoplasma
Zhu, Shui-Fang; Lin, Cai-Li; Tian, Guo-Zhong; Xu, Xia; Zhao, Wen-Jun
2013-01-01
Phytoplasmas are plant pathogenic bacteria that have no cell wall and are responsible for major crop losses throughout the world. Phytoplasma-infected plants show a variety of symptoms and the mechanisms they use to physiologically alter the host plants are of considerable interest, but poorly understood. In this study we undertook a detailed analysis of Paulownia infected by Paulownia witches’-broom (PaWB) Phytoplasma using high-throughput mRNA sequencing (RNA-Seq) and digital gene expression (DGE). RNA-Seq analysis identified 74,831 unigenes, which were subsequently used as reference sequences for DGE analysis of diseased and healthy Paulownia in field grown and tissue cultured plants. Our study revealed that dramatic changes occurred in the gene expression profile of Paulownia after PaWB Phytoplasma infection. Genes encoding key enzymes in cytokinin biosynthesis, such as isopentenyl diphosphate isomerase and isopentenyltransferase, were significantly induced in the infected Paulownia. Genes involved in cell wall biosynthesis and degradation were largely up-regulated and genes related to photosynthesis were down-regulated after PaWB Phytoplasma infection. Our systematic analysis provides comprehensive transcriptomic data about plants infected by Phytoplasma. This information will help further our understanding of the detailed interaction mechanisms between plants and Phytoplasma. PMID:24130859
TopHat: discovering splice junctions with RNA-Seq
Trapnell, Cole; Pachter, Lior; Salzberg, Steven L.
2009-01-01
Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19289445
Profiling RNA editing in human tissues: towards the inosinome Atlas
Picardi, Ernesto; Manzari, Caterina; Mastropasqua, Francesca; Aiello, Italia; D’Erchia, Anna Maria; Pesole, Graziano
2015-01-01
Adenine to Inosine RNA editing is a widespread co- and post-transcriptional mechanism mediated by ADAR enzymes acting on double stranded RNA. It has a plethora of biological effects, appears to be particularly pervasive in humans with respect to other mammals, and is implicated in a number of diverse human pathologies. Here we present the first human inosinome atlas comprising 3,041,422 A-to-I events identified in six tissues from three healthy individuals. Matched directional total-RNA-Seq and whole genome sequence datasets were generated and analysed within a dedicated computational framework, also capable of detecting hyper-edited reads. Inosinome profiles are tissue specific and edited gene sets consistently show enrichment of genes involved in neurological disorders and cancer. Overall frequency of editing also varies, but is strongly correlated with ADAR expression levels. The inosinome database is available at: http://srv00.ibbe.cnr.it/editing/. PMID:26449202
Novel functional microRNAs from virus-free and infected Vitis vinifera plants under water stress
Pantaleo, Vitantonio; Vitali, Marco; Boccacci, Paolo; Miozzi, Laura; Cuozzo, Danila; Chitarra, Walter; Mannini, Franco; Lovisolo, Claudio; Gambino, Giorgio
2016-01-01
MicroRNAs (miRNAs) are small non-coding RNAs that regulate the post-transcriptional control of several pathway intermediates, thus playing pivotal roles in plant growth, development and response to biotic and abiotic stresses. In recent years, the grapevine genome release, small(s)-RNAseq and degradome-RNAseq together has allowed the discovery and characterisation of many miRNA species, thus rendering the discovery of additional miRNAs difficult and uncertain. Taking advantage of the miRNA responsiveness to stresses and the availability of virus-free Vitis vinifera plants and those infected only by a latent virus, we have analysed grapevines subjected to drought in greenhouse conditions. The sRNA-seq and other sequence-specific molecular analyses have allowed us to characterise conserved miRNA expression profiles in association with specific eco-physiological parameters. In addition, we here report 12 novel grapevine-specific miRNA candidates and describe their expression profile. We show that latent viral infection can influence the miRNA profiles of V. vinifera in response to drought. Moreover, study of eco-physiological parameters showed that photosynthetic rate, stomatal conductance and hydraulic resistance to water transport were significantly influenced by drought and viral infection. Although no unequivocal cause–effect explanation could be attributed to each miRNA target, their contribution to the drought response is discussed. PMID:26833264
Wang, Wenlan; Xue, Li; Li, Ya; Li, Rong; Xie, Xiaoping; Bao, Junxiang; Hai, Chunxu; Li, Jinsheng
2016-01-01
To elucidate the altered gene network in the brains of carbon monoxide (CO) poisoned rats after treatment with hyperbaric oxygen (HBO₂). RNA sequencing (RNA-seq) analysis was performed to examine differentially expressed genes (DEGs) in brain tissue samples from nine male rats: a normal control group; a CO poisoning group; and an HBO₂ treatment group (three rats/group). Reverse transcription polymerase chain reaction (RT-PCR) and real-time quantitative PCR were used for validation of the DEGs in another 18 male rats (six rats/group). RNA-seq revealed that two genes were upregulated (4.18 and 8.76 log to the base 2 fold change) (p⟨0.05) in the CO-poisoned rats relative to the control rats; two genes were upregulated (3.88 and 7.69 log to the base 2 fold change); and 23 genes were downregulated (3.49-15.12 log to the base 2 fold change) (p⟨0.05) in the brains of the HBO₂-treated rats relative to the CO-poisoned rats. Target prediction of DEGs by gene network analysis and analysis of pathways affected suggested that regulation of gene expressions of dopamine metabolism and nitric oxide (NO) synthesis were significantly affected by CO poisoning and HBO₂ treatment. Results of RT-PCR and real-time quantitative PCR indicated that four genes (Pomc, GH-1, Pr1 and Fshβ) associated with hormone secretion in the hypothalamic-pituitary system have potential as markers for prognosis of CO. This study is the first RNA-seq analysis profile of HBO₂ treatment on rats with acute CO poisoning. It concludes that changes of hormone secretion in the hypothalamic-pituitary system, dopamine metabolism and NO synthesis involved in brain damage and behavior abnormalities after CO poisoning and HBO₂ therapy may regulate these changes.
Tabassum, Rubina; Sivadas, Ambily; Agrawal, Vartika; Tian, Haozheng; Arafat, Dalia; Gibson, Greg
2015-08-13
Personalized medicine is predicated on the notion that individual biochemical and genomic profiles are relatively constant in times of good health and to some extent predictive of disease or therapeutic response. We report a pilot study quantifying gene expression and methylation profile consistency over time, addressing the reasons for individual uniqueness, and its relation to N = 1 phenotypes. Whole blood samples from four African American women, four Caucasian women, and four Caucasian men drawn from the Atlanta Center for Health Discovery and Well Being study at three successive 6-month intervals were profiled by RNA-Seq, miRNA-Seq, and Illumina Methylation 450 K arrays. Standard regression approaches were used to evaluate the proportion of variance for each type of omic measure among individuals, and to quantify correlations among measures and with clinical attributes related to wellness. Longitudinal omic profiles were in general highly consistent over time, with an average of 67 % variance in transcript abundance, 42 % in CpG methylation level (but 88 % for the most differentiated CpG per gene), and 50 % in miRNA abundance among individuals, which are all comparable to 74 % variance among individuals for 74 clinical traits. One third of the variance could be attributed to differential blood cell type abundance, which was also fairly stable over time, and a lesser amount to expression quantitative trait loci (eQTL) effects. Seven conserved axes of covariance that capture diverse aspects of immune function explained over half of the variance. These axes also explained a considerable proportion of individually extreme transcript abundance, namely approximately 100 genes that were significantly up-regulated or down-regulated in each person and were in some cases enriched for relevant gene activities that plausibly associate with clinical attributes. A similar fraction of genes had individually divergent methylation levels, but these did not overlap with the transcripts, and fewer than 20 % of genes had significantly correlated methylation and gene expression. People express an "omic personality" consisting of peripheral blood transcriptional and epigenetic profiles that are constant over the course of a year and reflect various types of immune activity. Baseline genomic profiles can provide a window into the molecular basis of traits that might be useful for explaining medical conditions or guiding personalized health decisions.
Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi
2018-03-09
High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.
Petegrosso, Raphael; Tolar, Jakub
2018-01-01
Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593
Michel, Audrey M.; Mullan, James P. A.; Velayudhan, Vimalkumar; O'Connor, Patrick B. F.; Donohue, Claire A.; Baranov, Pavel V.
2016-01-01
ABSTRACT Ribosome profiling (ribo-seq) is a technique that uses high-throughput sequencing to reveal the exact locations and densities of translating ribosomes at the entire transcriptome level. The technique has become very popular since its inception in 2009. Yet experimentalists who generate ribo-seq data often have to rely on bioinformaticians to process and analyze their data. We present RiboGalaxy (http://ribogalaxy.ucc.ie), a freely available Galaxy-based web server for processing and analyzing ribosome profiling data with the visualization functionality provided by GWIPS-viz (http://gwips.ucc.ie). RiboGalaxy offers researchers a suite of tools specifically tailored for processing ribo-seq and corresponding mRNA-seq data. Researchers can take advantage of the published workflows which reduce the multi-step alignment process to a minimum of inputs from the user. Users can then explore their own aligned data as custom tracks in GWIPS-viz and compare their ribosome profiles to existing ribo-seq tracks from published studies. In addition, users can assess the quality of their ribo-seq data, determine the strength of the triplet periodicity signal, generate meta-gene ribosome profiles as well as analyze the relative impact of mRNA sequence features on local read density. RiboGalaxy is accompanied by extensive documentation and tips for helping users. In addition we provide a forum (http://gwips.ucc.ie/Forum) where we encourage users to post their questions and feedback to improve the overall RiboGalaxy service. PMID:26821742
Tao, Wensi; Ayala-Haedo, Juan A; Field, Matthew G; Pelaez, Daniel; Wester, Sara T
2017-12-01
The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell-specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways.
Tao, Wensi; Ayala-Haedo, Juan A.; Field, Matthew G.; Pelaez, Daniel; Wester, Sara T.
2017-01-01
Purpose The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Methods Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell–specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Results Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Conclusion Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways. PMID:29214313
Cochain, Clément; Vafadarnejad, Ehsan; Arampatzi, Panagiota; Jaroslav, Pelisek; Winkels, Holger; Ley, Klaus; Wolf, Dennis; Saliba, Antoine-Emmanuel; Zernecke, Alma
2018-03-15
Rationale: It is assumed that atherosclerotic arteries contain several macrophage subsets endowed with specific functions. The precise identity of these subsets is poorly characterized as they ha ve been defined by the expression of a restricted number of markers. Objective: We have applied single-cell RNA-seq as an unbiased profiling strategy to interrogate and classify aortic macrophage heterogeneity at the single-cell level in atherosclerosis. Methods and Results: We performed single-cell RNA sequencing of total aortic CD45 + cells extracted from the non-diseased (chow fed) and atherosclerotic (11 weeks of high fat diet) aorta of Ldlr -/- mice. Unsupervised clustering singled out 13 distinct aortic cell clusters. Among the myeloid cell populations, Resident-like macrophages with a gene expression profile similar to aortic resident macrophages were found in healthy and diseased aortae, whereas monocytes, monocyte-derived dendritic cells (MoDC), and two populations of macrophages were almost exclusively detectable in atherosclerotic aortae, comprising Inflammatory macrophages showing enrichment in I l1b , and previously undescribed TREM2 hi macrophages. Differential gene expression and gene ontology enrichment analyses revealed specific gene expression patterns distinguishing these three macrophage subsets and MoDC, and uncovered putative functions of each cell type. Notably, TREM2 hi macrophages appeared to be endowed with specialized functions in lipid metabolism and catabolism, and presented a gene expression signature reminiscent of osteoclasts, suggesting a role in lesion calcification. TREM2 expression was moreover detected in human lesional macrophages. Importantly, these macrophage populations were present also in advanced atherosclerosis and in Apoe -/- aortae, indicating relevance of our findings in different stages of atherosclerosis and mouse models. Conclusions: These data unprecedentedly uncovered the transcriptional landscape and phenotypic heterogeneity of aortic macrophages and MoDCs in atherosclerotic and identified previously unrecognized macrophage populations and their gene expression signature, suggesting specialized functions. Our findings will open up novel opportunities to explore distinct myeloid cell populations and their functions in atherosclerosis.
Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes.
Zhang, Jin; Griffith, Malachi; Miller, Christopher A; Griffith, Obi L; Spencer, David H; Walker, Jason R; Magrini, Vincent; McGrath, Sean D; Ly, Amy; Helton, Nichole M; Trissal, Maria; Link, Daniel C; Dang, Ha X; Larson, David E; Kulkarni, Shashikant; Cordes, Matthew G; Fronick, Catrina C; Fulton, Robert S; Klco, Jeffery M; Mardis, Elaine R; Ley, Timothy J; Wilson, Richard K; Maher, Christopher A
2017-11-01
To detect diverse and novel RNA species comprehensively, we compared deep small RNA and RNA sequencing (RNA-seq) methods applied to a primary acute myeloid leukemia (AML) sample. We were able to discover previously unannotated small RNAs using deep sequencing of a library method using broader insert size selection. We analyzed the long noncoding RNA (lncRNA) landscape in AML by comparing deep sequencing from multiple RNA-seq library construction methods for the sample that we studied and then integrating RNA-seq data from 179 AML cases. This identified lncRNAs that are completely novel, differentially expressed, and associated with specific AML subtypes. Our study revealed the complexity of the noncoding RNA transcriptome through a combined strategy of strand-specific small RNA and total RNA-seq. This dataset will serve as an invaluable resource for future RNA-based analyses. Copyright © 2017 ISEH – Society for Hematology and Stem Cells. Published by Elsevier Inc. All rights reserved.
Kao, Damian; Felix, Daniel; Aboobaker, Aziz
2013-11-16
Planarians can regenerate entire animals from a small fragment of the body. The regenerating fragment is able to create new tissues and remodel existing tissues to form a complete animal. Thus different fragments with very different starting components eventually converge on the same solution. In this study, we performed an extensive RNA-seq time-course on regenerating head and tail fragments to observe the differences and similarities of the transcriptional landscape between head and tail fragments during regeneration. We have consolidated existing transcriptomic data for S. mediterranea to generate a high confidence set of transcripts for use in genome wide expression studies. We performed a RNA-seq time-course on regenerating head and tail fragments from 0 hours to 3 days. We found that the transcriptome profiles of head and tail regeneration were very different at the start of regeneration; however, an unexpected convergence of transcriptional profiles occurred at 48 hours when head and tail fragments are still morphologically distinct. By comparing differentially expressed transcripts at various time-points, we revealed that this divergence/convergence pattern is caused by a shared regulatory program that runs early in heads and later in tails.Additionally, we also performed RNA-seq on smed-prep(RNAi) tail fragments which ultimately fail to regenerate anterior structures. We find the gene regulation program in response to smed-prep(RNAi) to display the opposite regulatory trend compared to the previously mentioned share regulatory program during regeneration. Using annotation data and comparative approaches, we also identified a set of approximately 4,800 triclad specific transcripts that were enriched amongst the genes displaying differential expression during the regeneration time-course. The regeneration transcriptome of head and tail regeneration provides us with a rich resource for investigating the global expression changes that occurs during regeneration. We show that very different regenerative scenarios utilize a shared core regenerative program. Furthermore, our consolidated transcriptome and annotations allowed us to identity triclad specific transcripts that are enriched within this core regulatory program. Our data support the hypothesis that both conserved aspects of animal developmental programs and recent evolutionarily innovations work in concert to control regeneration.
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
Zhu, Youyin; Li, Yongqiang; Xin, Dedong; Chen, Wenrong; Shao, Xu; Wang, Yue; Guo, Weidong
2015-01-25
Bud dormancy is a critical biological process allowing Chinese cherry (Prunus pseudocerasus) to survive in winter. Due to the lake of genomic information, molecular mechanisms triggering endodormancy release in flower buds have remained unclear. Hence, we used Illumina RNA-Seq technology to carry out de novo transcriptome assembly and digital gene expression profiling of flower buds. Approximately 47million clean reads were assembled into 50,604 sequences with an average length of 837bp. A total of 37,650 unigene sequences were successfully annotated. 128 pathways were annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, and metabolic, biosynthesis of second metabolite and plant hormone signal transduction accounted for higher percentage in flower bud. In critical period of endodormancy release, 1644, significantly differentially expressed genes (DEGs) were identified from expression profile. DEGs related to oxidoreductase activity were especially abundant in Gene Ontology (GO) molecular function category. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis demonstrated that DEGs were involved in various metabolic processes, including phytohormone metabolism. Quantitative real-time PCR (qRT-PCR) analysis indicated that levels of DEGs for abscisic acid and gibberellin biosynthesis decreased while the abundance of DEGs encoding their degradation enzymes increased and GID1 was down-regulated. Concomitant with endodormancy release, MADS-box transcription factors including P. pseudocerasus dormancy-associated MADS-box (PpcDAM), Agamous-like2, and APETALA3-like genes, shown remarkably epigenetic roles. The newly generated transcriptome and gene expression profiling data provide valuable genetic information for revealing transcriptomic variation during bud dormancy in Chinese cherry. The uncovered data should be useful for future studies of bud dormancy in Prunus fruit trees lacking genomic information. Copyright © 2014 Elsevier B.V. All rights reserved.
Wan, Zhiyi; Lu, Yanan; Rui, Lei; Yu, Xiaoxue; Yang, Fang; Tu, Chengfang; Li, Zandong
2017-06-20
Most female birds develop only a left ovary, whereas males develop bilateral testes. The mechanism underlying this process is still not completely understood. Here, we provide a comprehensive transcriptional analysis of female chicken gonads and identify novel candidate side-biased genes. RNA-Seq analysis was carried out on total RNA harvested from the left and right gonads on embryonic day 6 (E6), E12, and post-hatching day 1 (D1). By comparing the gene expression profiles between the left and right gonads, 347 differentially expressed genes (DEGs) were obtained on E6, 3730 were obtained on E12, and 2787 were obtained on D1. Side-specific genes were primarily derived from the autosome rather than the sex chromosome. Gene ontology and pathway analysis showed that the DEGs were most enriched in the Piwi-interactiing RNA (piRNA) metabolic process, germ plasm, chromatoid body, P granule, neuroactive ligand-receptor interaction, microbial metabolism in diverse environments, and methane metabolism. A total of 111 DEGs, five gene ontology (GO) terms, and three pathways were significantly different between the left and right gonads among all the development stages. We also present the gene number and the percentage within eight development-dependent expression patterns of DEGs in the left and right gonads of female chicken.
RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.
Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu
2018-05-30
One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.
Li, Shi-Weng; Leng, Yan; Shi, Rui-Fang
2017-02-17
Hydrogen peroxide (H 2 O 2 ) has been known to function as a signalling molecule involved in the modulation of various physiological processes in plants. H 2 O 2 has been shown to act as a promoter during adventitious root formation in hypocotyl cuttings. In this study, RNA-Seq was performed to reveal the molecular mechanisms underlying H 2 O 2 -induced adventitious rooting. RNA-Seq data revealed that H 2 O 2 treatment greatly increased the numbers of clean reads and expressed genes and abundance of gene expression relative to the water treatment. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses indicated that a profound change in gene function occurred in the 6-h H 2 O 2 treatment and that H 2 O 2 mainly enhanced gene expression levels at the 6-h time point but reduced gene expression levels at the 24-h time point compared with the water treatment. In total, 4579 differentially expressed (2-fold change > 2) unigenes (DEGs), of which 78.3% were up-regulated and 21.7% were down-regulated; 3525 DEGs, of which 64.0% were up-regulated and 36.0% were down-regulated; and 7383 DEGs, of which 40.8% were up-regulated and 59.2% were down-regulated were selected in the 6-h, 24-h, and from 6- to 24-h treatments, respectively. The number of DEGs in the 6-h treatment was 29.9% higher than that in the 24-h treatment. The functions of the most highly regulated genes were associated with stress response, cell redox homeostasis and oxidative stress response, cell wall loosening and modification, metabolic processes, and transcription factors (TFs), as well as plant hormone signalling, including auxin, ethylene, cytokinin, gibberellin, and abscisic acid pathways. Notably, a large number of genes encoding for heat shock proteins (HSPs) and heat shock transcription factors (HSFs) were significantly up-regulated during H 2 O 2 treatments. Furthermore, real-time quantitative PCR (qRT-PCR) results showed that, during H 2 O 2 treatments, the expression levels of ARFs, IAAs, AUXs, NACs, RD22, AHKs, MYBs, PIN1, AUX15A, LBD29, LBD41, ADH1b, and QORL were significantly up-regulated at the 6- and/or 24-h time points. In contrast, PER1 and PER2 were significantly down-regulated by H 2 O 2 treatment. These qRT-PCR results strongly correlated with the RNA-Seq data. Using RNA-Seq and qRT-PCR techniques, we analysed the global changes in gene expression and functional profiling during H 2 O 2 -induced adventitious rooting in mung bean seedlings. These results strengthen the current understanding of H 2 O 2 -induced adventitious rooting and the molecular traits of H 2 O 2 priming in plants.
Linnorm: improved statistical analysis for single cell RNA-seq expression data
Yip, Shun H.; Wang, Panwen; Kocher, Jean-Pierre A.; Sham, Pak Chung
2017-01-01
Abstract Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. PMID:28981748
Nepal, Chirag; Coolen, Marion; Hadzhiev, Yavor; Cussigh, Delphine; Mydel, Piotr; Steen, Vidar M.; Carninci, Piero; Andersen, Jesper B.; Bally-Cuif, Laure; Müller, Ferenc; Lenhard, Boris
2016-01-01
MicroRNAs (miRNAs) play a major role in the post-transcriptional regulation of target genes, especially in development and differentiation. Our understanding about the transcriptional regulation of miRNA genes is limited by inadequate annotation of primary miRNA (pri-miRNA) transcripts. Here, we used CAGE-seq and RNA-seq to provide genome-wide identification of the pri-miRNA core promoter repertoire and its dynamic usage during zebrafish embryogenesis. We assigned pri-miRNA promoters to 152 precursor-miRNAs (pre-miRNAs), the majority of which were supported by promoter associated post-translational histone modifications (H3K4me3, H2A.Z) and RNA polymerase II (RNAPII) occupancy. We validated seven miR-9 pri-miRNAs by in situ hybridization and showed similar expression patterns as mature miR-9. In addition, processing of an alternative intronic promoter of miR-9–5 was validated by 5′ RACE PCR. Developmental profiling revealed a subset of pri-miRNAs that are maternally inherited. Moreover, we show that promoter-associated H3K4me3, H2A.Z and RNAPII marks are not only present at pri-miRNA promoters but are also specifically enriched at pre-miRNAs, suggesting chromatin level regulation of pre-miRNAs. Furthermore, we demonstrated that CAGE-seq also detects 3′-end processing of pre-miRNAs on Drosha cleavage site that correlates with miRNA-offset RNAs (moRNAs) production and provides a new tool for detecting Drosha processing events and predicting pre-miRNA processing by a genome-wide assay. PMID:26673698
Mapping RNA-seq Reads with STAR
Dobin, Alexander; Gingeras, Thomas R.
2015-01-01
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920
Mapping RNA-seq Reads with STAR.
Dobin, Alexander; Gingeras, Thomas R
2015-09-03
Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.
SERE: single-parameter quality control and sample comparison for RNA-Seq.
Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S
2012-10-03
Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
SERE: Single-parameter quality control and sample comparison for RNA-Seq
2012-01-01
Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915
Sun, Lina; Yang, Hongsheng; Chen, Muyan; Ma, Deyou; Lin, Chenggang
2013-01-01
Background Sea cucumbers (Holothuroidea; Echinodermata) have the capacity to regenerate lost tissues and organs. Although the histological and cytological aspects of intestine regeneration have been extensively studied, little is known of the genetic mechanisms involved. There has, however, been a renewed effort to develop a database of Expressed Sequence Tags (ESTs) in Apostichopus japonicus, an economically-important species that occurs in China. This is important for studies on genetic breeding, molecular markers and special physiological phenomena. We have also constructed a library of ESTs obtained from the regenerative body wall and intestine of A. japonicus. The database has increased to ∼30000 ESTs. Results We used RNA-Seq to determine gene expression profiles associated with intestinal regeneration in A. japonicus at 3, 7, 14 and 21 days post evisceration (dpe). This was compared to profiles obtained from a normally-functioning intestine. Approximately 5 million (M) reads were sequenced in every library. Over 2400 up-regulated genes (>10%) and over 1000 down-regulated genes (∼5%) were observed at 3 and 7dpe (log2Ratio≥1, FDR≤0.001). Specific “Go terms” revealed that the DEGs (Differentially Expressed Genes) performed an important function at every regeneration stage. Besides some expected pathways (for example, Ribosome and Spliceosome pathway term), the “Notch signaling pathway,” the “ECM-receptor interaction” and the “Cytokine-cytokine receptor interaction” were significantly enriched. We also investigated the expression profiles of developmental genes, ECM-associated genes and Cytoskeletal genes. Twenty of the most important differentially expressed genes (DEGs) were verified by Real-time PCR, which resulted in a trend concordance of almost 100% between the two techniques. Conclusion Our studies demonstrated dynamic changes in global gene expression during intestine regeneration and presented a series of candidate genes and enriched pathways that contribute to intestine regeneration in sea cucumbers. This provides a foundation for future studies on the genetics/molecular mechanisms associated with intestine regeneration. PMID:23936330
Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data.
Racle, Julien; de Jonge, Kaat; Baumgaertner, Petra; Speiser, Daniel E; Gfeller, David
2017-11-13
Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).
Improving RNA-Seq expression estimates by correcting for fragment bias
2011-01-01
The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973
Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.
Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M
2017-02-27
RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie
2018-01-01
Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630
Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W
2018-04-12
RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.
do Nascimento, Naíla C; Guimaraes, Ana M S; Dos Santos, Andrea P; Chu, Yuefeng; Marques, Lucas M; Messick, Joanne B
2018-06-18
Pigs are popular animal models in biomedical research. RNA-Seq is becoming the predominant tool to investigate transcriptional changes of the pig's response to infection. The high sensitivity of this tool requires a strict control of the study design beginning with the selection of healthy animals to provide accurate interpretation of research data. Pigs chronically infected with Mycoplasma suis often show no obvious clinical signs, however the infection may affect the validity of animal research. The goal of this study was to investigate whether or not this silent infection is also silent at the host transcriptional level. Therefore, immunocompetent pigs were experimentally infected with M. suis and transcriptional profiles of whole blood, generated by RNA-Seq, were analyzed and compared to non-infected animals. RNA-Seq showed 55 differentially expressed (DE) genes in the M. suis infected pigs. Down-regulation of genes related to innate immunity (tlr8, chemokines, chemokines receptors) and genes containing IFN gamma-activated sequence (gbp1, gbp2, il15, cxcl10, casp1, cd274) suggests a general suppression of the immune response in the infected animals. Sixteen (29.09%) of the DE genes were involved in two protein interaction networks: one involving chemokines, chemokine receptors and interleukin-15 and another involving the complement cascade. Genes related to vascular permeability, blood coagulation, and endothelium integrity were also DE in infected pigs. These findings suggest that M. suis subclinical infection causes significant alterations in blood mRNA levels, which could impact data interpretation of research using pigs. Screening of pigs for M. suis infection before initiating animal studies is strongly recommended.
Statistical modeling of isoform splicing dynamics from RNA-seq time series data.
Huang, Yuanhua; Sanguinetti, Guido
2016-10-01
Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Davoli, R; Gaffo, E; Zappaterra, M; Bortoluzzi, S; Zambonelli, P
2018-06-01
The identification of the molecular mechanisms regulating pathways associated with the potential for fat deposition in pigs can lead to the detection of key genes and markers for the genetic improvement of fat traits. Interactions of microRNAs (miRNAs) with target RNAs regulate gene expression and modulate pathway activation in cells and tissues. In pigs, miRNA discovery is far from saturation, and the knowledge of miRNA expression in backfat tissue and particularly of the impact of miRNA variations is still fragmentary. Using RNA-seq, we characterized the small RNA (sRNA) expression profiles in Italian Large White pig backfat tissue. Comparing two groups of pigs divergent for backfat deposition, we detected 31 significant differentially expressed (DE) sRNAs: 14 up-regulated (including ssc-miR-132, ssc-miR-146b, ssc-miR-221-5p, ssc-miR-365-5p and the moRNA ssc-moR-21-5p) and 17 down-regulated (including ssc-miR-136, ssc-miR-195, ssc-miR-199a-5p and ssc-miR-335). To understand the biological impact of the observed miRNA expression variations, we used the expression correlation of DE miRNA target transcripts expressed in the same samples to define a regulatory network of 193 interactions between DE miRNAs and 40 DE target transcripts showing opposite expression profiles and being involved in specific pathways. Several miRNAs and mRNAs in the network were found to be expressed from backfat-related pig QTL. These results are informative for the complex mechanisms influencing fat traits, shed light on a new aspect of the genetic regulation of fat deposition in pigs and facilitate the prospective implementation of innovative strategies of pig genetic improvement based on genomic markers. © 2018 Stichting International Foundation for Animal Genetics.
DEsingle for detecting three types of differential expression in single-cell RNA-seq data.
Miao, Zhun; Deng, Ke; Wang, Xiaowo; Zhang, Xuegong
2018-04-24
The excessive amount of zeros in single-cell RNA-seq data include "real" zeros due to the on-off nature of gene transcription in single cells and "dropout" zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy. The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor's consideration now. zhangxg@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.
Damron, F. Heath; Oglesby-Sherrouse, Amanda G.; Wilks, Angela; Barbier, Mariette
2016-01-01
Determining bacterial gene expression during infection is fundamental to understand pathogenesis. In this study, we used dual RNA-seq to simultaneously measure P. aeruginosa and the murine host’s gene expression and response to respiratory infection. Bacterial genes encoding products involved in metabolism and virulence were differentially expressed during infection and the type III and VI secretion systems were highly expressed in vivo. Strikingly, heme acquisition, ferric-enterobactin transport, and pyoverdine biosynthesis genes were found to be significantly up-regulated during infection. In the mouse, we profiled the acute immune response to P. aeruginosa and identified the pro-inflammatory cytokines involved in acute response to the bacterium in the lung. Additionally, we also identified numerous host iron sequestration systems upregulated during infection. Overall, this work sheds light on how P. aeruginosa triggers a pro-inflammatory response and competes for iron with the host during infection, as iron is one of the central elements for which both pathogen and host fight during acute pneumonia. PMID:27982111
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.
Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan
2017-01-01
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
Song, Li; Florea, Liliana
2015-01-01
Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Bhuiyan, Ali Akbar; Li, Jingjin; Wu, Zhenyang; Ni, Pan; Adetula, Adeyinka Abiola; Wang, Haiyan; Zhang, Cheng; Tang, Xiaohui; Bhuyan, Anjuman Ara; Zhao, Shuhong; Du, Xiaoyong
2017-01-01
Gastrointestinal nematodes (GINs) are one of the most economically important parasites of small ruminants and a major animal health concern in many regions of the world. However, the molecular mechanisms of the host response to GIN infections in goat are still little known. In this study, two genetically distinct goat populations, one relatively resistant and the other susceptible to GIN infections, were identified in Yichang goat and then four individuals in each group were chosen to compare mRNA expression profiles using RNA-seq. Field experiment showed lower worm burden, delayed and reduced egg production in the relatively resistant group than the susceptible group. The analysis of RNA-seq showed that 2369 genes, 1407 of which were up-regulated and 962 down-regulated, were significantly (p < 0.001) differentially expressed between these two groups. Functional annotation of the 298 genes more highly expressed in the resistant group yielded a total of 46 significant (p < 0.05) functional annotation clusters including 31 genes (9 in innate immunity, 13 in immunity, and 9 in innate immune response) related to immune biosynthetic process as well as transforming growth factor (TGF)-β, mitogen-activated protein kinase (MAPK), and cell adhesion molecules (CAMs) pathways. Our findings provide insights that are immediately relevant for the improvement of host resistance to GIN infections and which will make it possible to know the mechanisms underlying the resistance of goats to GIN infections. PMID:28368324
Stranded Whole Transcriptome RNA-Seq for All RNA Types
Yan, Pearlly X.; Fang, Fang; Buechlein, Aaron; Ford, James B.; Tang, Haixu; Huang, Tim H.; Burow, Matthew E.; Liu, Yunlong; Rusch, Douglas B.
2015-01-01
Stranded whole transcriptome RNA-Seq described in this unit captures quantitative expression data for all types of RNA including, but not limited to miRNA (microRNA), piRNA (Piwi-interacting RNA), snoRNA (small nucleolar RNA), lincRNA (large non-coding intergenic RNA), SRP RNA (signal recognition particle RNA), tRNA (transfer RNA), mtRNA (mitochondrial RNA) and mRNA (messenger RNA). The size and nature of these types of RNA are irrelevant to the approach described here. Barcoded libraries for multiplexing on the Illumina platform are generated with this approach but it can be applied to other platforms with a few modifications. PMID:25599667
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Getting the most out of RNA-seq data analysis.
Khang, Tsung Fei; Lau, Ching Yee
2015-01-01
Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.
Akbari, Omar S; Antoshechkin, Igor; Amrhein, Henry; Williams, Brian; Diloreto, Race; Sandler, Jeremy; Hay, Bruce A
2013-09-04
Mosquitoes are vectors of a number of important human and animal diseases. The development of novel vector control strategies requires a thorough understanding of mosquito biology. To facilitate this, we used RNA-seq to identify novel genes and provide the first high-resolution view of the transcriptome throughout development and in response to blood feeding in a mosquito vector of human disease, Aedes aegypti, the primary vector for Dengue and yellow fever. We characterized mRNA expression at 34 distinct time points throughout Aedes development, including adult somatic and germline tissues, by using polyA+ RNA-seq. We identify a total of 14,238 novel new transcribed regions corresponding to 12,597 new loci, as well as many novel transcript isoforms of previously annotated genes. Altogether these results increase the annotated fraction of the transcribed genome into long polyA+ RNAs by more than twofold. We also identified a number of patterns of shared gene expression, as well as genes and/or exons expressed sex-specifically or sex-differentially. Expression profiles of small RNAs in ovaries, early embryos, testes, and adult male and female somatic tissues also were determined, resulting in the identification of 38 new Aedes-specific miRNAs, and ~291,000 small RNA new transcribed regions, many of which are likely to be endogenous small-interfering RNAs and Piwi-interacting RNAs. Genes of potential interest for transgene-based vector control strategies also are highlighted. Our data have been incorporated into a user-friendly genome browser located at www.Aedes.caltech.edu, with relevant links to Vectorbase (www.vectorbase.org).
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.
Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C
2018-06-01
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Liu, Tian; Cheng, Anchun; Wang, Mingshu; Jia, Renyong; Yang, Qiao; Wu, Ying; Sun, Kunfeng; Zhu, Dekang; Chen, Shun; Liu, Mafeng; Zhao, XinXin; Chen, Xiaoyue
2017-09-13
Duck plague virus (DPV), a member of alphaherpesvirus sub-family, can cause significant economic losses on duck farms in China. DPV Chinese virulent strain (CHv) is highly pathogenic and could induce massive ducks death. Attenuated DPV vaccines (CHa) have been put into service against duck plague with billions of doses in China each year. Researches on DPV have been development for many years, however, a comprehensive understanding of molecular mechanisms underlying pathogenicity of CHv strain and protection of CHa strain to ducks is still blank. In present study, we performed RNA-seq technology to analyze transcriptome profiling of duck spleens for the first time to identify differentially expressed genes (DEGs) associated with the infection of CHv and CHa at 24 h. Comparison of gene expression with mock ducks revealed 748 DEGs and 484 DEGs after CHv and CHa infection, respectively. Gene pathway analysis of DEGs highlighted valuable biological processes involved in host immune response, cell apoptosis and viral invasion. Genes expressed in those pathways were different in CHv infected duck spleens and CHa vaccinated duck spleens. The results may provide valuable information for us to explore the reasons of pathogenicity caused by CHv strain and protection activated by CHa strain.
Kumar, Mukesh; Belcaid, Mahdi; Nerurkar, Vivek R.
2016-01-01
Differential host responses may be critical determinants of distinct pathologies of West Nile virus (WNV) NY99 (pathogenic) and WNV Eg101 (non-pathogenic) strains. We employed RNA-seq technology to analyze global differential gene expression in WNV-infected mice brain and to identify the host cellular factors leading to lethal encephalitis. We identified 1,400 and 278 transcripts, which were differentially expressed after WNV NY99 and WNV Eg101 infections, respectively, and 147 genes were common to infection with both the viruses. Genes that were up-regulated in infection with both the viruses were mainly associated with interferon signaling. Genes associated with inflammation and cell death/apoptosis were only expressed after WNV NY99 infection. We demonstrate that differences in the activation of key pattern recognition receptors resulted in the induction of unique innate immune profiles, which corresponded with the induction of interferon and inflammatory responses. Pathway analysis of differentially expressed genes indicated that after WNV NY99 infection, TREM-1 mediated activation of toll-like receptors leads to the high inflammatory response. In conclusion, we have identified both common and specific responses to WNV NY99 and WNV Eg101 infections as well as genes linked to potential resistance to infection that may be targets for therapeutics. PMID:27211830
Mao, Shihong; Goodrich, Robert J; Hauser, Russ; Schrader, Steven M; Chen, Zhen; Krawetz, Stephen A
2013-10-01
Different semen storage and sperm purification methods may affect the integrity of isolated spermatozoal RNA. RNA-Seq was applied to determine whether semen storage methods (pelleted vs. liquefied) and somatic cell lysis buffer (SCLB) vs. PureSperm (PS) purification methods affect the quantity and quality of sperm RNA. The results indicate that the method of semen storage does not markedly impact RNA profiling whereas the choice of purification can yield significant differences. RNA-Seq showed that the majority of mitochondrial and mid-piece associated transcripts were lost after SCLB purification, which indicated that the mid-piece of spermatozoa may have been compromised. In addition, the number of stable transcript pairs from SCLB-samples was less than that from the PS samples. This study supports the view that PS purification better maintains the integrity of spermatozoal RNAs.
Ho, Ming-Fen; Lummertz da Rocha, Edroaldo; Zhang, Cheng; Ingle, James N; Goss, Paul E; Shepherd, Lois E; Kubo, Michiaki; Wang, Liewei; Li, Hu; Weinshilboum, Richard M
2018-06-01
T-cell leukemia 1A ( TCL1A ) single-nucleotide polymorphisms (SNPs) have been associated with aromatase inhibitor-induced musculoskeletal adverse events. We previously demonstrated that TCL1A is inducible by estradiol (E 2 ) and plays a critical role in the regulation of cytokines, chemokines, and Toll-like receptors in a TCL1A SNP genotype and estrogen-dependent fashion. Furthermore, TCLIA SNP-dependent expression phenotypes can be "reversed" by exposure to selective estrogen receptor modulators such as 4-hydroxytamoxifen (4OH-TAM). The present study was designed to comprehensively characterize the role of TCL1A in transcriptional regulation across the genome by performing RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) assays with lymphoblastoid cell lines. RNA-seq identified 357 genes that were regulated in a TCL1A SNP- and E 2 -dependent fashion with expression patterns that were 4OH-TAM reversible. ChIP-seq for the same cells identified 57 TCL1A binding sites that could be regulated by E 2 in a SNP-dependent fashion. Even more striking, nuclear factor- κ B (NF- κ B) p65 bound to those same DNA regions. In summary, TCL1A is a novel transcription factor with expression that is regulated in a SNP- and E 2 -dependent fashion-a pattern of expression that can be reversed by 4OH-TAM. Integrated RNA-seq and ChIP-seq results suggest that TCL1A also acts as a transcriptional coregulator with NF- κ B p65, an important immune system transcription factor. Copyright © 2018 by The American Society for Pharmacology and Experimental Therapeutics.
Linnorm: improved statistical analysis for single cell RNA-seq expression data.
Yip, Shun H; Wang, Panwen; Kocher, Jean-Pierre A; Sham, Pak Chung; Wang, Junwen
2017-12-15
Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
An RNA-Seq based gene expression atlas of the common bean.
O'Rourke, Jamie A; Iniguez, Luis P; Fu, Fengli; Bucciarelli, Bruna; Miller, Susan S; Jackson, Scott A; McClean, Philip E; Li, Jun; Dai, Xinbin; Zhao, Patrick X; Hernandez, Georgina; Vance, Carroll P
2014-10-06
Common bean (Phaseolus vulgaris) is grown throughout the world and comprises roughly 50% of the grain legumes consumed worldwide. Despite this, genetic resources for common beans have been lacking. Next generation sequencing, has facilitated our investigation of the gene expression profiles associated with biologically important traits in common bean. An increased understanding of gene expression in common bean will improve our understanding of gene expression patterns in other legume species. Combining recently developed genomic resources for Phaseolus vulgaris, including predicted gene calls, with RNA-Seq technology, we measured the gene expression patterns from 24 samples collected from seven tissues at developmentally important stages and from three nitrogen treatments. Gene expression patterns throughout the plant were analyzed to better understand changes due to nodulation, seed development, and nitrogen utilization. We have identified 11,010 genes differentially expressed with a fold change ≥ 2 and a P-value < 0.05 between different tissues at the same time point, 15,752 genes differentially expressed within a tissue due to changes in development, and 2,315 genes expressed only in a single tissue. These analyses identified 2,970 genes with expression patterns that appear to be directly dependent on the source of available nitrogen. Finally, we have assembled this data in a publicly available database, The Phaseolus vulgaris Gene Expression Atlas (Pv GEA), http://plantgrn.noble.org/PvGEA/ . Using the website, researchers can query gene expression profiles of their gene of interest, search for genes expressed in different tissues, or download the dataset in a tabular form. These data provide the basis for a gene expression atlas, which will facilitate functional genomic studies in common bean. Analysis of this dataset has identified genes important in regulating seed composition and has increased our understanding of nodulation and impact of the nitrogen source on assimilation and distribution throughout the plant.
Carlson, Hanqian L; Quinn, Jeffrey J; Yang, Yul W; Thornburg, Chelsea K; Chang, Howard Y; Stadler, H Scott
2015-12-01
Gene expression profiling in E 11 mouse embryos identified high expression of the long noncoding RNA (lncRNA), LNCRNA-HIT in the undifferentiated limb mesenchyme, gut, and developing genital tubercle. In the limb mesenchyme, LncRNA-HIT was found to be retained in the nucleus, forming a complex with p100 and CBP. Analysis of the genome-wide distribution of LncRNA-HIT-p100/CBP complexes by ChIRP-seq revealed LncRNA-HIT associated peaks at multiple loci in the murine genome. Ontological analysis of the genes contacted by LncRNA-HIT-p100/CBP complexes indicate a primary role for these loci in chondrogenic differentiation. Functional analysis using siRNA-mediated reductions in LncRNA-HIT or p100 transcripts revealed a significant decrease in expression of many of the LncRNA-HIT-associated loci. LncRNA-HIT siRNA treatments also impacted the ability of the limb mesenchyme to form cartilage, reducing mesenchymal cell condensation and the formation of cartilage nodules. Mechanistically the LncRNA-HIT siRNA treatments impacted pro-chondrogenic gene expression by reducing H3K27ac or p100 activity, confirming that LncRNA-HIT is essential for chondrogenic differentiation in the limb mesenchyme. Taken together, these findings reveal a fundamental epigenetic mechanism functioning during early limb development, using LncRNA-HIT and its associated proteins to promote the expression of multiple genes whose products are necessary for the formation of cartilage.
Carlson, Hanqian L.; Quinn, Jeffrey J.; Yang, Yul W.; Thornburg, Chelsea K.; Chang, Howard Y.; Stadler, H. Scott
2015-01-01
Gene expression profiling in E 11 mouse embryos identified high expression of the long noncoding RNA (lncRNA), LNCRNA-HIT in the undifferentiated limb mesenchyme, gut, and developing genital tubercle. In the limb mesenchyme, LncRNA-HIT was found to be retained in the nucleus, forming a complex with p100 and CBP. Analysis of the genome-wide distribution of LncRNA-HIT-p100/CBP complexes by ChIRP-seq revealed LncRNA-HIT associated peaks at multiple loci in the murine genome. Ontological analysis of the genes contacted by LncRNA-HIT-p100/CBP complexes indicate a primary role for these loci in chondrogenic differentiation. Functional analysis using siRNA-mediated reductions in LncRNA-HIT or p100 transcripts revealed a significant decrease in expression of many of the LncRNA-HIT-associated loci. LncRNA-HIT siRNA treatments also impacted the ability of the limb mesenchyme to form cartilage, reducing mesenchymal cell condensation and the formation of cartilage nodules. Mechanistically the LncRNA-HIT siRNA treatments impacted pro-chondrogenic gene expression by reducing H3K27ac or p100 activity, confirming that LncRNA-HIT is essential for chondrogenic differentiation in the limb mesenchyme. Taken together, these findings reveal a fundamental epigenetic mechanism functioning during early limb development, using LncRNA-HIT and its associated proteins to promote the expression of multiple genes whose products are necessary for the formation of cartilage. PMID:26633036
McArt, Darragh G.; Dunne, Philip D.; Blayney, Jaine K.; Salto-Tellez, Manuel; Van Schaeybroeck, Sandra; Hamilton, Peter W.; Zhang, Shu-Dong
2013-01-01
The advent of next generation sequencing technologies (NGS) has expanded the area of genomic research, offering high coverage and increased sensitivity over older microarray platforms. Although the current cost of next generation sequencing is still exceeding that of microarray approaches, the rapid advances in NGS will likely make it the platform of choice for future research in differential gene expression. Connectivity mapping is a procedure for examining the connections among diseases, genes and drugs by differential gene expression initially based on microarray technology, with which a large collection of compound-induced reference gene expression profiles have been accumulated. In this work, we aim to test the feasibility of incorporating NGS RNA-Seq data into the current connectivity mapping framework by utilizing the microarray based reference profiles and the construction of a differentially expressed gene signature from a NGS dataset. This would allow for the establishment of connections between the NGS gene signature and those microarray reference profiles, alleviating the associated incurring cost of re-creating drug profiles with NGS technology. We examined the connectivity mapping approach on a publicly available NGS dataset with androgen stimulation of LNCaP cells in order to extract candidate compounds that could inhibit the proliferative phenotype of LNCaP cells and to elucidate their potential in a laboratory setting. In addition, we also analyzed an independent microarray dataset of similar experimental settings. We found a high level of concordance between the top compounds identified using the gene signatures from the two datasets. The nicotine derivative cotinine was returned as the top candidate among the overlapping compounds with potential to suppress this proliferative phenotype. Subsequent lab experiments validated this connectivity mapping hit, showing that cotinine inhibits cell proliferation in an androgen dependent manner. Thus the results in this study suggest a promising prospect of integrating NGS data with connectivity mapping. PMID:23840550
Solana, Jordi; Kao, Damian; Mihaylova, Yuliana; Jaber-Hijazi, Farah; Malla, Sunir; Wilson, Ray; Aboobaker, Aziz
2012-01-01
Planarian stem cells, or neoblasts, drive the almost unlimited regeneration capacities of freshwater planarians. Neoblasts are traditionally described by their morphological features and by the fact that they are the only proliferative cell type in asexual planarians. Therefore, they can be specifically eliminated by irradiation. Irradiation, however, is likely to induce transcriptome-wide changes in gene expression that are not associated with neoblast ablation. This has affected the accurate description of their specific transcriptomic profile. We introduce the use of Smed-histone-2B RNA interference (RNAi) for genetic ablation of neoblast cells in Schmidtea mediterranea as an alternative to irradiation. We characterize the rapid, neoblast-specific phenotype induced by Smed-histone-2B RNAi, resulting in neoblast ablation. We compare and triangulate RNA-seq data after using both irradiation and Smed-histone-2B RNAi over a time course as means of neoblast ablation. Our analyses show that Smed-histone-2B RNAi eliminates neoblast gene expression with high specificity and discrimination from gene expression in other cellular compartments. We compile a high confidence list of genes downregulated by both irradiation and Smed-histone-2B RNAi and validate their expression in neoblast cells. Lastly, we analyze the overall expression profile of neoblast cells. Our list of neoblast genes parallels their morphological features and is highly enriched for nuclear components, chromatin remodeling factors, RNA splicing factors, RNA granule components and the machinery of cell division. Our data reveal that the regulation of planarian stem cells relies on posttranscriptional regulatory mechanisms and suggest that planarians are an ideal model for this understudied aspect of stem cell biology.
Czimmerer, Zsolt; Varga, Tamas; Kiss, Mate; Vázquez, Cesaré Ovando; Doan-Xuan, Quang Minh; Rückerl, Dominik; Tattikota, Sudhir Gopal; Yan, Xin; Nagy, Zsuzsanna S; Daniel, Bence; Poliska, Szilard; Horvath, Attila; Nagy, Gergely; Varallyay, Eva; Poy, Matthew N; Allen, Judith E; Bacso, Zsolt; Abreu-Goodger, Cei; Nagy, Laszlo
2016-05-31
IL-4-driven alternative macrophage activation and proliferation are characteristic features of both antihelminthic immune responses and wound healing in contrast to classical macrophage activation, which primarily occurs during inflammatory responses. The signaling pathways defining the genome-wide microRNA expression profile as well as the cellular functions controlled by microRNAs during alternative macrophage activation are largely unknown. Hence, in the current work we examined the regulation and function of IL-4-regulated microRNAs in human and mouse alternative macrophage activation. We utilized microarray-based microRNA profiling to detect the dynamic expression changes during human monocyte-macrophage differentiation and IL-4-mediated alternative macrophage activation. The expression changes and upstream regulatory pathways of selected microRNAs were further investigated in human and mouse in vitro and in vivo models of alternative macrophage activation by integrating small RNA-seq, ChIP-seq, ChIP-quantitative PCR, and gene expression data. MicroRNA-controlled gene networks and corresponding functions were identified using a combination of transcriptomic, bioinformatic, and functional approaches. The IL-4-controlled microRNA expression pattern was identified in models of human and mouse alternative macrophage activation. IL-4-dependent induction of miR-342-3p and repression of miR-99b along with miR-125a-5p occurred in both human and murine macrophages in vitro. In addition, a similar expression pattern was observed in peritoneal macrophages of Brugia malayi nematode-implanted mice in vivo. By using IL4Rα- and STAT6-deficient macrophages, we were able to show that IL-4-dependent regulation of miR-342-3p, miR-99b, and miR-125a-5p is mediated by the IL-4Rα-STAT6 signaling pathway. The combination of gene expression studies and chromatin immunoprecipitation experiments demonstrated that both miR-342-3p and its host gene, EVL, are coregulated directly by STAT6. Finally, we found that miR-342-3p is capable of controlling macrophage survival through targeting an anti-apoptotic gene network including Bcl2l1. Our findings identify a conserved IL-4/STAT6-regulated microRNA signature in alternatively activated human and mouse macrophages. Moreover, our study indicates that miR-342-3p likely plays a pro-apoptotic role in such cells, thereby providing a negative feedback arm to IL-4-dependent macrophage proliferation.
Xiao, Ru-Yue; Hao, Junjun; Ding, Yi-Hong; Che, Yan-Yun; Zou, Xiao-Ju; Liang, Bin
2016-10-17
Due to misbalanced energy surplus and expenditure, obesity has become a common chronic disorder that is highly associated with many metabolic diseases. Pu-erh tea, a traditional Chinese beverage, has been believed to have numerous health benefits, such as anti-obesity. However, the underlying mechanisms of its anti-obesity effect are yet to be understood. Here, we take the advantages of transcriptional profile by RNA sequencing (RNA-Seq) to view the global gene expression of Pu-erh tea. The model organism Caenorhabditis elegans was treated with different concentrations of Pu-erh tea water extract (PTE, 0 g/mL, 0.025 g/mL, and 0.05 g/mL). Compared with the control, PTE indeed decreases lipid droplets size and fat accumulation. The high-throughput RNA-Sequence technique detected 18073 and 18105 genes expressed in 0.025 g/mL and 0.05 g/mL PTE treated groups, respectively. Interestingly, the expression of the vitellogenin family ( vit-1 , vit-2 , vit-3, vit-4 and vit-5 ) was significantly decreased by PTE, which was validated by qPCR analysis. Furthermore, vit-1(ok2616) , vit-3(ok2348) and vit-5(ok3239) mutants are insensitive to PTE triggered fat reduction. In conclusion, our transcriptional profile by RNA-Sequence suggests that Pu-erh tea lowers the fat accumulation primarily through repression of the expression of vit (vitellogenin) family, in addition to our previously reported (sterol regulatory element binding protein) SREBP-SCD (stearoyl-CoA desaturase) axis.
2013-01-01
Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455
Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian
2013-11-09
The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.
Recovery of high-quality RNA from laser capture microdissected human and rodent pancreas.
Butler, Alexandra E; Matveyenko, Aleksey V; Kirakossian, David; Park, Johanna; Gurlo, Tatyana; Butler, Peter C
Laser capture microdissection (LCM) is a powerful method to isolate specific populations of cells for subsequent analysis such as gene expression profiling, for example, microarrays or ribonucleic (RNA)-Seq. This technique has been applied to frozen as well as formalin-fixed, paraffin-embedded (FFPE) specimens with variable outcomes regarding quality and quantity of extracted RNA. The goal of the study was to develop the methods to isolate high-quality RNA from islets of Langerhans and pancreatic duct glands (PDG) isolated by LCM. We report an optimized protocol for frozen sections to minimize RNA degradation and maximize recovery of expected transcripts from the samples using quantitative real-time polymerase chain reaction (RT-PCR) by adding RNase inhibitors at multiple steps during the experiment. This technique reproducibly delivered intact RNA (RIN values 6-7). Using quantitative RT-PCR, the expected profiles of insulin, glucagon, mucin6 (Muc6), and cytokeratin-19 (CK-19) mRNA in PDGs and pancreatic islets were detected. The described experimental protocol for frozen pancreas tissue might also be useful for other tissues with moderate to high levels of intrinsic ribonuclease (RNase) activity.
Bar, Ido; Cummins, Scott; Elizur, Abigail
2016-03-10
Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transplanting undifferentiated germ cells derived from a donor species into larvae of a host species. The transplanted surrogates will then grow and mature to produce donor-derived seed, thus providing a simpler alternative to maintaining large-bodied broodstock such as the bluefin tuna. Implementation of GCT for new species requires the development of molecular tools to follow the fate of the transplanted germ cells. These tools are based on key reproductive and germ cell-specific genes. RNA-Sequencing (RNA-Seq) provides a rapid, cost-effective method for high throughput gene identification in non-model species. This study utilized RNA-Seq to identify key genes expressed in the gonads of Southern bluefin tuna (Thunnus maccoyii, SBT) and their specific expression patterns in male and female gonad cells. Key genes involved in the reproductive molecular pathway and specifically, germ cell development in gonads, were identified using analysis of RNA-Seq transcriptomes of male and female SBT gonad cells. Expression profiles of transcripts from ovary and testis cells were compared, as well as testis germ cell-enriched fraction prepared with Percoll gradient, as used in GCT studies. Ovary cells demonstrated over-expression of genes related to stem cell maintenance, while in testis cells, transcripts encoding for reproduction-associated receptors, sex steroids and hormone synthesis and signaling genes were over-expressed. Within the testis cells, the Percoll-enriched fraction showed over-expression of genes that are related to post-meiosis germ cell populations. Gonad development and germ cell related genes were identified from SBT gonads and their expression patterns in ovary and testis cells were determined. These expression patterns correlate with the reproductive developmental stage of the sampled fish. The majority of the genes described in this study were sequenced for the first time in T. maccoyii. The wealth of SBT gonadal and germ cell-related gene sequences made publicly available by this study provides an extensive resource for further GCT and reproductive molecular biology studies of this commercially valuable fish.
Evaluation of microRNA alignment techniques
Kaspi, Antony; El-Osta, Assam
2016-01-01
Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164
Wimmer, Isabella; Tröscher, Anna R; Brunner, Florian; Rubino, Stephen J; Bien, Christian G; Weiner, Howard L; Lassmann, Hans; Bauer, Jan
2018-04-20
Formalin-fixed paraffin-embedded (FFPE) tissues are valuable resources commonly used in pathology. However, formalin fixation modifies nucleic acids challenging the isolation of high-quality RNA for genetic profiling. Here, we assessed feasibility and reliability of microarray studies analysing transcriptome data from fresh, fresh-frozen (FF) and FFPE tissues. We show that reproducible microarray data can be generated from only 2 ng FFPE-derived RNA. For RNA quality assessment, fragment size distribution (DV200) and qPCR proved most suitable. During RNA isolation, extending tissue lysis time to 10 hours reduced high-molecular-weight species, while additional incubation at 70 °C markedly increased RNA yields. Since FF- and FFPE-derived microarrays constitute different data entities, we used indirect measures to investigate gene signal variation and relative gene expression. Whole-genome analyses revealed high concordance rates, while reviewing on single-genes basis showed higher data variation in FFPE than FF arrays. Using an experimental model, gene set enrichment analysis (GSEA) of FFPE-derived microarrays and fresh tissue-derived RNA-Seq datasets yielded similarly affected pathways confirming the applicability of FFPE tissue in global gene expression analysis. Our study provides a workflow comprising RNA isolation, quality assessment and microarray profiling using minimal RNA input, thus enabling hypothesis-generating pathway analyses from limited amounts of precious, pathologically significant FFPE tissues.
TSSAR: TSS annotation regime for dRNA-seq data.
Amman, Fabian; Wolfinger, Michael T; Lorenz, Ronny; Hofacker, Ivo L; Stadler, Peter F; Findeiß, Sven
2014-03-27
Differential RNA sequencing (dRNA-seq) is a high-throughput screening technique designed to examine the architecture of bacterial operons in general and the precise position of transcription start sites (TSS) in particular. Hitherto, dRNA-seq data were analyzed by visualizing the sequencing reads mapped to the reference genome and manually annotating reliable positions. This is very labor intensive and, due to the subjectivity, biased. Here, we present TSSAR, a tool for automated de novo TSS annotation from dRNA-seq data that respects the statistics of dRNA-seq libraries. TSSAR uses the premise that the number of sequencing reads starting at a certain genomic position within a transcriptional active region follows a Poisson distribution with a parameter that depends on the local strength of expression. The differences of two dRNA-seq library counts thus follow a Skellam distribution. This provides a statistical basis to identify significantly enriched primary transcripts.We assessed the performance by analyzing a publicly available dRNA-seq data set using TSSAR and two simple approaches that utilize user-defined score cutoffs. We evaluated the power of reproducing the manual TSS annotation. Furthermore, the same data set was used to reproduce 74 experimentally validated TSS in H. pylori from reliable techniques such as RACE or primer extension. Both analyses showed that TSSAR outperforms the static cutoff-dependent approaches. Having an automated and efficient tool for analyzing dRNA-seq data facilitates the use of the dRNA-seq technique and promotes its application to more sophisticated analysis. For instance, monitoring the plasticity and dynamics of the transcriptomal architecture triggered by different stimuli and growth conditions becomes possible.The main asset of a novel tool for dRNA-seq analysis that reaches out to a broad user community is usability. As such, we provide TSSAR both as intuitive RESTful Web service ( http://rna.tbi.univie.ac.at/TSSAR) together with a set of post-processing and analysis tools, as well as a stand-alone version for use in high-throughput dRNA-seq data analysis pipelines.
Lukša, Juliana; Ravoitytė, Bazilė; Konovalovas, Aleksandras; Aitmanaitė, Lina; Butenko, Anzhelika; Serva, Saulius; Servienė, Elena
2017-01-01
Competitive and naturally occurring yeast killer phenotype is governed by coinfection with dsRNA viruses. Long-term relationship between the host cell and viruses appear to be beneficial and co-adaptive; however, the impact of viral dsRNA on the host gene expression has barely been investigated. Here, we determined the transcriptomic profiles of the host Saccharomyces cerevisiae upon the loss of the M-2 dsRNA alone and the M-2 along with the L-A-lus dsRNAs. We provide a comprehensive study based on the high-throughput RNA-Seq data, Gene Ontology and the analysis of the interaction networks. We identified 486 genes differentially expressed after curing yeast cells of the M-2 dsRNA and 715 genes affected by the elimination of both M-2 and L-A-lus dsRNAs. We report that most of the transcriptional responses induced by viral dsRNAs are moderate. Differently expressed genes are related to ribosome biogenesis, mitochondrial functions, stress response, biosynthesis of lipids and amino acids. Our study also provided insight into the virus–host and virus–virus interplays. PMID:28757599
Lukša, Juliana; Ravoitytė, Bazilė; Konovalovas, Aleksandras; Aitmanaitė, Lina; Butenko, Anzhelika; Yurchenko, Vyacheslav; Serva, Saulius; Servienė, Elena
2017-07-25
Competitive and naturally occurring yeast killer phenotype is governed by coinfection with dsRNA viruses. Long-term relationship between the host cell and viruses appear to be beneficial and co-adaptive; however, the impact of viral dsRNA on the host gene expression has barely been investigated. Here, we determined the transcriptomic profiles of the host Saccharomyces cerevisiae upon the loss of the M-2 dsRNA alone and the M-2 along with the L-A-lus dsRNAs. We provide a comprehensive study based on the high-throughput RNA-Seq data, Gene Ontology and the analysis of the interaction networks. We identified 486 genes differentially expressed after curing yeast cells of the M-2 dsRNA and 715 genes affected by the elimination of both M-2 and L-A-lus dsRNAs. We report that most of the transcriptional responses induced by viral dsRNAs are moderate. Differently expressed genes are related to ribosome biogenesis, mitochondrial functions, stress response, biosynthesis of lipids and amino acids. Our study also provided insight into the virus-host and virus-virus interplays.
The emerging genomics and systems biology research lead to systems genomics studies.
Yang, Mary Qu; Yoshigoe, Kenji; Yang, William; Tong, Weida; Qin, Xiang; Dunker, A; Chen, Zhongxue; Arbania, Hamid R; Liu, Jun S; Niemierko, Andrzej; Yang, Jack Y
2014-01-01
Synergistically integrating multi-layer genomic data at systems level not only can lead to deeper insights into the molecular mechanisms related to disease initiation and progression, but also can guide pathway-based biomarker and drug target identification. With the advent of high-throughput next-generation sequencing technologies, sequencing both DNA and RNA has generated multi-layer genomic data that can provide DNA polymorphism, non-coding RNA, messenger RNA, gene expression, isoform and alternative splicing information. Systems biology on the other hand studies complex biological systems, particularly systematic study of complex molecular interactions within specific cells or organisms. Genomics and molecular systems biology can be merged into the study of genomic profiles and implicated biological functions at cellular or organism level. The prospectively emerging field can be referred to as systems genomics or genomic systems biology. The Mid-South Bioinformatics Centre (MBC) and Joint Bioinformatics Ph.D. Program of University of Arkansas at Little Rock and University of Arkansas for Medical Sciences are particularly interested in promoting education and research advancement in this prospectively emerging field. Based on past investigations and research outcomes, MBC is further utilizing differential gene and isoform/exon expression from RNA-seq and co-regulation from the ChiP-seq specific for different phenotypes in combination with protein-protein interactions, and protein-DNA interactions to construct high-level gene networks for an integrative genome-phoneme investigation at systems biology level.
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
Rahmatallah, Yasir; Emmert-Streib, Frank
2016-01-01
Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
RNA-Skim: a rapid method for RNA-Seq quantification at transcript level
Zhang, Zhaojun; Wang, Wei
2014-01-01
Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: weiwang@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931995
Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.
2016-01-01
High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274
Zhao, Shanrong; Xi, Li; Quan, Jie; Xi, Hualin; Zhang, Ying; von Schack, David; Vincent, Michael; Zhang, Baohong
2016-01-08
RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists. By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines. The high degree of automation and interactivity in QuickRNASeq leads to a substantial reduction in the time and effort required prior to further downstream analyses and interpretation of the analyses findings. QuickRNASeq advances primary RNA-seq data analyses to the next level of automation, and is mature for public release and adoption.
Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan
2015-01-01
Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species. PMID:26177103
Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan
2015-01-01
Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species.
Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.
Davidson, Nadia M; Oshlack, Alicia
2018-05-01
RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.
Formalin-fixed paraffin-embedded (FFPE) samples provide a vast untapped resource for chemical safety and translational science. To date, genomic profiling of FFPE samples has been limited by poor RNA quality and inconsistent results with limited utility in dose-response assessmen...
Pazhamala, Lekha T; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K; Varshney, Rajeev K
2016-01-01
Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety "Asha" (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits.
Pazhamala, Lekha T.; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K.; Varshney, Rajeev K.
2016-01-01
Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety “Asha” (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits. PMID:27760186
Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting
2015-01-01
A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.
Mulligan, Megan K; Mozhui, Khyobeni; Pandey, Ashutosh K; Smith, Maren L; Gong, Suzhen; Ingels, Jesse; Miles, Michael F; Lopez, Marcelo F; Lu, Lu; Williams, Robert W
2017-02-01
Genetic factors that influence the transition from initial drinking to dependence remain enigmatic. Recent studies have leveraged chronic intermittent ethanol (CIE) paradigms to measure changes in brain gene expression in a single strain at 0, 8, 72 h, and even 7 days following CIE. We extend these findings using LCM RNA-seq to profile expression in 11 brain regions in two inbred strains - C57BL/6J (B6) and DBA/2J (D2) - 72 h following multiple cycles of ethanol self-administration and CIE. Linear models identified differential expression based on treatment, region, strain, or interactions with treatment. Nearly 40% of genes showed a robust effect (FDR < 0.01) of region, and hippocampus CA1, cortex, bed nucleus stria terminalis, and nucleus accumbens core had the highest number of differentially expressed genes after treatment. Another 8% of differentially expressed genes demonstrated a robust effect of strain. As expected, based on similar studies in B6, treatment had a much smaller impact on expression; only 72 genes (p < 0.01) are modulated by treatment (independent of region or strain). Strikingly, many more genes (415) show a strain-specific and largely opposite response to treatment and are enriched in processes related to RNA metabolism, transcription factor activity, and mitochondrial function. Over 3 times as many changes in gene expression were detected in D2 compared to B6, and weighted gene co-expression network analysis (WGCNA) module comparison identified more modules enriched for treatment effects in D2. Substantial strain differences exist in the temporal pattern of transcriptional neuroadaptation to CIE, and these may drive individual differences in risk of addiction following excessive alcohol consumption. Copyright © 2016 Elsevier Inc. All rights reserved.
Li, Chaoqun; Cao, Feifei; Li, Shengli; Huang, Shenglin; Li, Wei; Abumaria, Nashat
2018-01-01
Although studies provide insights into the neurobiology of stress and depression, the exact molecular mechanisms underlying their pathologies remain largely unknown. Long non-coding RNA (lncRNA) has been implicated in brain functions and behavior. A potential link between lncRNA and psychiatric disorders has been proposed. However, it remains undetermined whether IncRNA regulation, in the brain, contributes to stress or depression pathologies. In this study, we used a valid animal model of depression-like symptoms; namely learned helplessness, RNA-seq, Gene Ontology and co-expression network analyses to profile the expression pattern of lncRNA and mRNA in the hippocampus of mice. We identified 6346 differentially expressed transcripts. Among them, 340 lncRNAs and 3559 protein coding mRNAs were differentially expressed in helpless mice in comparison with control and/or non-helpless mice (inescapable stress resilient mice). Gene Ontology and pathway enrichment analyses indicated that induction of helplessness altered expression of mRNAs enriched in fundamental biological functions implicated in stress/depression neurobiology such as synaptic, metabolic, cell survival and proliferation, developmental and chromatin modification functions. To explore the possible regulatory roles of the altered lncRNAs, we constructed co-expression networks composed of the lncRNAs and mRNAs. Among our differentially expressed lncRNAs, 17% showed significant correlation with genes. Functional co-expression analysis linked the identified lncRNAs to several cellular mechanisms implicated in stress/depression neurobiology. Importantly, 57% of the identified regulatory lncRNAs significantly correlated with 18 different synapse-related functions. Thus, the current study identifies for the first time distinct groups of lncRNAs regulated by induction of learned helplessness in the mouse brain. Our results suggest that lncRNA-directed regulatory mechanisms might contribute to stress-induced pathologies; in particular, to inescapable stress-induced synaptic modifications. PMID:29375311
Li, Chaoqun; Cao, Feifei; Li, Shengli; Huang, Shenglin; Li, Wei; Abumaria, Nashat
2017-01-01
Although studies provide insights into the neurobiology of stress and depression, the exact molecular mechanisms underlying their pathologies remain largely unknown. Long non-coding RNA (lncRNA) has been implicated in brain functions and behavior. A potential link between lncRNA and psychiatric disorders has been proposed. However, it remains undetermined whether IncRNA regulation, in the brain, contributes to stress or depression pathologies. In this study, we used a valid animal model of depression-like symptoms; namely learned helplessness, RNA-seq, Gene Ontology and co-expression network analyses to profile the expression pattern of lncRNA and mRNA in the hippocampus of mice. We identified 6346 differentially expressed transcripts. Among them, 340 lncRNAs and 3559 protein coding mRNAs were differentially expressed in helpless mice in comparison with control and/or non-helpless mice (inescapable stress resilient mice). Gene Ontology and pathway enrichment analyses indicated that induction of helplessness altered expression of mRNAs enriched in fundamental biological functions implicated in stress/depression neurobiology such as synaptic, metabolic, cell survival and proliferation, developmental and chromatin modification functions. To explore the possible regulatory roles of the altered lncRNAs, we constructed co-expression networks composed of the lncRNAs and mRNAs. Among our differentially expressed lncRNAs, 17% showed significant correlation with genes. Functional co-expression analysis linked the identified lncRNAs to several cellular mechanisms implicated in stress/depression neurobiology. Importantly, 57% of the identified regulatory lncRNAs significantly correlated with 18 different synapse-related functions. Thus, the current study identifies for the first time distinct groups of lncRNAs regulated by induction of learned helplessness in the mouse brain. Our results suggest that lncRNA-directed regulatory mechanisms might contribute to stress-induced pathologies; in particular, to inescapable stress-induced synaptic modifications.
Pröll, Maren Julia; Neuhoff, Christiane; Schellander, Karl; Uddin, Muhammad Jasim; Cinar, Mehmet Ulas; Sahadevan, Sudeep; Qu, Xueqi; Islam, Md. Aminul; Poirier, Mikhael; Müller, Marcel A.; Drosten, Christian; Tesfaye, Dawit; Tholen, Ernst; Große-Brinkhaus, Christine
2017-01-01
The porcine reproductive and respiratory syndrome (PRRS) is an infectious disease that leads to high financial and production losses in the global swine industry. The pathogenesis of this disease is dependent on a multitude of factors, and its control remains problematic. The immune system generally defends against infectious diseases, especially dendritic cells (DCs), which play a crucial role in the activation of the immune response after viral infections. However, the understanding of the immune response and the genetic impact on the immune response to PRRS virus (PRRSV) remains incomplete. In light of this, we investigated the regulation of the host immune response to PRRSV in porcine lung DCs using RNA-sequencing (RNA-Seq). Lung DCs from two different pig breeds (Pietrain and Duroc) were collected before (0 hours) and during various periods of infection (3, 6, 9, 12, and 24 hours post infection (hpi)). RNA-Seq analysis revealed a total of 20,396 predicted porcine genes, which included breed-specific differentially expressed immune genes. Pietrain and Duroc infected lung DCs showed opposite gene expression courses during the first time points post infection. Duroc lung DCs reacted more strongly and distinctly than Pietrain lung DCs during these periods (3, 6, 9, 12 hpi). Additionally, cluster analysis revealed time-dependent co-expressed groups of genes that were involved in immune-relevant pathways. Key clusters and pathways were identified, which help to explain the biological and functional background of lung DCs post PRRSV infection and suggest IL-1β1 as an important candidate gene. RNA-Seq was also used to characterize the viral replication of PRRSV for each breed. PRRSV was able to infect and to replicate differently in lung DCs between the two mentioned breeds. These results could be useful in investigations on immunity traits in pig breeding and enhancing the health of pigs. PMID:29140992
Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data
Racle, Julien; de Jonge, Kaat; Baumgaertner, Petra; Speiser, Daniel E
2017-01-01
Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org). PMID:29130882
dCLIP: a computational approach for comparative CLIP-seq analyses
2014-01-01
Although comparison of RNA-protein interaction profiles across different conditions has become increasingly important to understanding the function of RNA-binding proteins (RBPs), few computational approaches have been developed for quantitative comparison of CLIP-seq datasets. Here, we present an easy-to-use command line tool, dCLIP, for quantitative CLIP-seq comparative analysis. The two-stage method implemented in dCLIP, including a modified MA normalization method and a hidden Markov model, is shown to be able to effectively identify differential binding regions of RBPs in four CLIP-seq datasets, generated by HITS-CLIP, iCLIP and PAR-CLIP protocols. dCLIP is freely available at http://qbrc.swmed.edu/software/. PMID:24398258
RNA-seq mixology: designing realistic control experiments to compare protocols and analysis methods
Holik, Aliaksei Z.; Law, Charity W.; Liu, Ruijie; Wang, Zeya; Wang, Wenyi; Ahn, Jaeil; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.
2017-01-01
Abstract Carefully designed control experiments provide a gold standard for benchmarking different genomics research tools. A shortcoming of many gene expression control studies is that replication involves profiling the same reference RNA sample multiple times. This leads to low, pure technical noise that is atypical of regular studies. To achieve a more realistic noise structure, we generated a RNA-sequencing mixture experiment using two cell lines of the same cancer type. Variability was added by extracting RNA from independent cell cultures and degrading particular samples. The systematic gene expression changes induced by this design allowed benchmarking of different library preparation kits (standard poly-A versus total RNA with Ribozero depletion) and analysis pipelines. Data generated using the total RNA kit had more signal for introns and various RNA classes (ncRNA, snRNA, snoRNA) and less variability after degradation. For differential expression analysis, voom with quality weights marginally outperformed other popular methods, while for differential splicing, DEXSeq was simultaneously the most sensitive and the most inconsistent method. For sample deconvolution analysis, DeMix outperformed IsoPure convincingly. Our RNA-sequencing data set provides a valuable resource for benchmarking different protocols and data pre-processing workflows. The extra noise mimics routine lab experiments more closely, ensuring any conclusions are widely applicable. PMID:27899618
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less
Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.; ...
2016-05-31
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less
Manga, Punita; Klingeman, Dawn M; Lu, Tse-Yuan S; Mehlhorn, Tonia L; Pelletier, Dale A; Hauser, Loren J; Wilson, Charlotte M; Brown, Steven D
2016-01-01
RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, which were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). This study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.
Potts, Anastasia H; Leng, Yuanyuan; Babitzke, Paul; Romeo, Tony
2018-03-29
The Csr global regulatory system coordinates gene expression in response to metabolic status. This system utilizes the RNA binding protein CsrA to regulate gene expression by binding to transcripts of structural and regulatory genes, thus affecting their structure, stability, translation, and/or transcription elongation. CsrA activity is controlled by sRNAs, CsrB and CsrC, which sequester CsrA away from other transcripts. CsrB/C levels are partly determined by their rates of turnover, which requires CsrD to render them susceptible to RNase E cleavage. Previous epistasis analysis suggested that CsrD affects gene expression through the other Csr components, CsrB/C and CsrA. However, those conclusions were based on a limited analysis of reporters. Here, we reassessed the global behavior of the Csr circuitry using epistasis analysis with RNA seq (Epi-seq). Because CsrD effects on mRNA levels were entirely lost in the csrA mutant and largely eliminated in a csrB/C mutant under our experimental conditions, while the majority of CsrA effects persisted in the absence of csrD, the original model accounts for the global behavior of the Csr system. Our present results also reflect a more nuanced role of CsrA as terminal regulator of the Csr system than has been recognized.
Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin
2018-01-01
Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder (Paralichthys olivaceus) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin, toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV. PMID:29304016
Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin
2018-01-05
Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder ( Paralichthys olivaceus ) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin , toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV.
Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G
2017-01-01
Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.
Guo, Yue; Su, Zheng-Yuan; Zhang, Chengyue; Gaspar, John M; Wang, Rui; Hart, Ronald P; Verzi, Michael P; Kong, Ah-Ng Tony
2017-07-01
Colorectal cancer (CRC) remains the leading cause of cancer-related death in the world. Aspirin (ASA) and curcumin (CUR) are widely investigated chemopreventive candidates for CRC. However, the precise mechanisms of their action and their combinatorial effects have not been evaluated. The purpose of the present study was to determine the effect of ASA, CUR, and their combination in azoxymethane/dextran sulfate sodium (AOM/DSS)-induced colitis-accelerated colorectal cancer (CAC). We also aimed to characterize the differential gene expression profiles in AOM/DSS-induced tumors as well as in tumors modulated by ASA and CUR using RNA-seq. Diets supplemented with 0.02% ASA, 2% CUR or 0.01% ASA+1% CUR were given to mice from 1week prior to the AOM injection until the experiment was terminated 22weeks after AOM initiation. Our results showed that CUR had a superior inhibitory effect in colon tumorigenesis compared to that of ASA. The combination of ASA and CUR at a lower dose exhibited similar efficacy to that of a higher dose of CUR at 2%. RNA isolated from colonic tissue from the control group and from tumor samples from the experimental groups was subjected to RNA-seq. Transcriptomic analysis suggested that the low-dose combination of ASA and CUR modulated larger gene sets than the single treatment. These differentially expressed genes were situated in several canonical pathways important in the inflammatory network and liver metastasis in CAC. We identified a small subset of genes as potential molecular targets involved in the preventive action of the combination of ASA and CUR. Taken together, the current results provide the first evidence in support of the chemopreventive effect of a low-dose combination of ASA and CUR in CAC. Moreover, the transcriptional profile obtained in our study may provide a framework for identifying the mechanisms underlying the carcinogenesis process from normal colonic tissue to tumor development as well as the cancer inhibitory effects and potential molecular targets of ASA and CUR. Copyright © 2017 Elsevier Inc. All rights reserved.
Lapierre, Pascal; Mir, Mushtaq; Chase, Michael R.; Pyle, Margaret M.; Gawande, Richa; Ahmad, Rushdy; Sarracino, David A.; Ioerger, Thomas R.; Fortune, Sarah M.; Derbyshire, Keith M.; Wade, Joseph T.; Gray, Todd A.
2015-01-01
RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5’ untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5’ end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5’ ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5’ UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression. PMID:26536359
Shell, Scarlet S; Wang, Jing; Lapierre, Pascal; Mir, Mushtaq; Chase, Michael R; Pyle, Margaret M; Gawande, Richa; Ahmad, Rushdy; Sarracino, David A; Ioerger, Thomas R; Fortune, Sarah M; Derbyshire, Keith M; Wade, Joseph T; Gray, Todd A
2015-11-01
RNA-seq technologies have provided significant insight into the transcription networks of mycobacteria. However, such studies provide no definitive information on the translational landscape. Here, we use a combination of high-throughput transcriptome and proteome-profiling approaches to more rigorously understand protein expression in two mycobacterial species. RNA-seq and ribosome profiling in Mycobacterium smegmatis, and transcription start site (TSS) mapping and N-terminal peptide mass spectrometry in Mycobacterium tuberculosis, provide complementary, empirical datasets to examine the congruence of transcription and translation in the Mycobacterium genus. We find that nearly one-quarter of mycobacterial transcripts are leaderless, lacking a 5' untranslated region (UTR) and Shine-Dalgarno ribosome-binding site. Our data indicate that leaderless translation is a major feature of mycobacterial genomes and is comparably robust to leadered initiation. Using translational reporters to systematically probe the cis-sequence requirements of leaderless translation initiation in mycobacteria, we find that an ATG or GTG at the mRNA 5' end is both necessary and sufficient. This criterion, together with our ribosome occupancy data, suggests that mycobacteria encode hundreds of small, unannotated proteins at the 5' ends of transcripts. The conservation of small proteins in both mycobacterial species tested suggests that some play important roles in mycobacterial physiology. Our translational-reporter system further indicates that mycobacterial leadered translation initiation requires a Shine Dalgarno site in the 5' UTR and that ATG, GTG, TTG, and ATT codons can robustly initiate translation. Our combined approaches provide the first comprehensive view of mycobacterial gene structures and their non-canonical mechanisms of protein expression.
Ma, Siming; Upneja, Akhil; Galecki, Andrzej; Tsai, Yi-Miau; Burant, Charles F; Raskind, Sasha; Zhang, Quanwei; Zhang, Zhengdong D; Seluanov, Andrei; Gorbunova, Vera; Clish, Clary B; Miller, Richard A; Gladyshev, Vadim N
2016-11-22
Mammalian lifespan differs by >100 fold, but the mechanisms associated with such longevity differences are not understood. Here, we conducted a study on primary skin fibroblasts isolated from 16 species of mammals and maintained under identical cell culture conditions. We developed a pipeline for obtaining species-specific ortholog sequences, profiled gene expression by RNA-seq and small molecules by metabolite profiling, and identified genes and metabolites correlating with species longevity. Cells from longer lived species up-regulated genes involved in DNA repair and glucose metabolism, down-regulated proteolysis and protein transport, and showed high levels of amino acids but low levels of lysophosphatidylcholine and lysophosphatidylethanolamine. The amino acid patterns were recapitulated by further analyses of primate and bird fibroblasts. The study suggests that fibroblast profiling captures differences in longevity across mammals at the level of global gene expression and metabolite levels and reveals pathways that define these differences.
Pepke, Shirley; Ver Steeg, Greg
2017-03-15
De novo inference of clinically relevant gene function relationships from tumor RNA-seq remains a challenging task. Current methods typically either partition patient samples into a few subtypes or rely upon analysis of pairwise gene correlations that will miss some groups in noisy data. Leveraging higher dimensional information can be expected to increase the power to discern targetable pathways, but this is commonly thought to be an intractable computational problem. In this work we adapt a recently developed machine learning algorithm for sensitive detection of complex gene relationships. The algorithm, CorEx, efficiently optimizes over multivariate mutual information and can be iteratively applied to generate a hierarchy of relatively independent latent factors. The learned latent factors are used to stratify patients for survival analysis with respect to both single factors and combinations. These analyses are performed and interpreted in the context of biological function annotations and protein network interactions that might be utilized to match patients to multiple therapies. Analysis of ovarian tumor RNA-seq samples demonstrates the algorithm's power to infer well over one hundred biologically interpretable gene cohorts, several times more than standard methods such as hierarchical clustering and k-means. The CorEx factor hierarchy is also informative, with related but distinct gene clusters grouped by upper nodes. Some latent factors correlate with patient survival, including one for a pathway connected with the epithelial-mesenchymal transition in breast cancer that is regulated by a microRNA that modulates epigenetics. Further, combinations of factors lead to a synergistic survival advantage in some cases. In contrast to studies that attempt to partition patients into a small number of subtypes (typically 4 or fewer) for treatment purposes, our approach utilizes subgroup information for combinatoric transcriptional phenotyping. Considering only the 66 gene expression groups that are found to both have significant Gene Ontology enrichment and are small enough to indicate specific drug targets implies a computational phenotype for ovarian cancer that allows for 3 66 possible patient profiles, enabling truly personalized treatment. The findings here demonstrate a new technique that sheds light on the complexity of gene expression dependencies in tumors and could eventually enable the use of patient RNA-seq profiles for selection of personalized and effective cancer treatments.
Striking circadian neuron diversity and cycling of Drosophila alternative splicing.
Wang, Qingqing; Abruzzi, Katharine C; Rosbash, Michael; Rio, Donald C
2018-06-04
Although alternative pre-mRNA splicing (AS) significantly diversifies the neuronal proteome, the extent of AS is still unknown due in part to the large number of diverse cell types in the brain. To address this complexity issue, we used an annotation-free computational method to analyze and compare the AS profiles between small specific groups of Drosophila circadian neurons. The method, the J unction U sage M odel (JUM), allows the comprehensive profiling of both known and novel AS events from specific RNA-seq libraries. The results show that many diverse and novel pre-mRNA isoforms are preferentially expressed in one class of clock neuron and also absent from the more standard Drosophila head RNA preparation. These AS events are enriched in potassium channels important for neuronal firing, and there are also cycling isoforms with no detectable underlying transcriptional oscillations. The results suggest massive AS regulation in the brain that is also likely important for circadian regulation. © 2018, Wang et al.
Transcription profile of boar spermatozoa as revealed by RNA-sequencing
USDA-ARS?s Scientific Manuscript database
High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...
NASA Astrophysics Data System (ADS)
Arce, DP; Krsticevic, FJ; Ezpeleta, J.; Ponce, SD; Pratta, GR; Tapia, E.
2016-04-01
The small heat shock proteins (sHSPs) have been found to play a critical role in physiological stress conditions in protecting proteins from irreversible aggregation. To characterize the gene expression profile of four sHsps with a tandem gene structure arrangement in the domesticated Solanum lycopersicum (Heinz 1706) genome and its wild close relative Solanum pimpinellifolium (LA1589), differential gene expression analysis using RNA-Seq was conducted in three ripening stages in both cultivars fruits. Gene promoter analysis was performed to explain the heterogeneous pattern of gene expression found for these tandem duplicated sHsps. In silico analysis results contribute to refocus wet experiment analysis in tomato sHsp family proteins.
Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Transfer RNA fragments (tRFs) are an established class of constitutive regulatory molecules that arise from precursor and mature tRNAs. RNA deep sequencing (RNA-seq) has greatly facilitated the study of tRFs. However, the repeat nature of the tRNA templates and the idiosyncrasies of tRNA sequences necessitate the development and use of methodologies that differ markedly from those used to analyze RNA-seq data when studying microRNAs (miRNAs) or messenger RNAs (mRNAs). Here we present MINTmap (for MItochondrial and Nuclear TRF mapping), a method and a software package that was developed specifically for the quick, deterministic and exhaustive identification of tRFs in short RNA-seq datasets. In addition to identifying them, MINTmap is able to unambiguously calculate and report both raw and normalized abundances for the discovered tRFs. Furthermore, to ensure specificity, MINTmap identifies the subset of discovered tRFs that could be originating outside of tRNA space and flags them as candidate false positives. Our comparative analysis shows that MINTmap exhibits superior sensitivity and specificity to other available methods while also being exceptionally fast. The MINTmap codes are available through https://github.com/TJU-CMC-Org/MINTmap/ under an open source GNU GPL v3.0 license. PMID:28220888
STAT5A and STAT5B have opposite correlations with drug response gene expression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lamba, V., E-mail: vlamba@ufl.edu; Jia, B.; Liang, F.
Introduction: STAT5A and STAT5B are important transcription factors that play a key role in regulation of several important physiological processes including proliferation, survival, mediation of responses to cytokines and in regulating gender differences in drug response genes such as the hepatic cytochrome P450s (CYPs) that are responsible for a large majority of drug metabolism reactions in the human body. STAT5A and STAT5b have a high degree of sequence homology and have been reported to have largely similar functions. Recent studies have, however, indicated that they can also often have distinct and unique roles in regulating gene expression. Objective: In thismore » study, we evaluated the association of STAT5A and STAT5B mRNA expression levels with those of several key hepatic cytochrome P450s (CYPs) and hepatic transcription factors (TFs) and evaluated the potential roles of STAT5A and 5b in mediating gender differences in these CYPs and TFs. Methods: Expression profiling for major hepatic CYP isoforms and transcription factors was performed using RNA sequencing (RNA-seq) in 102 human liver samples (57 female, 45 male). Real time PCR gene expression data for selected CYPs and TFs was available on a subset of 50 human liver samples (25 female, 25 male) and was used to validate the RNA-seq findings. Results: While STAT5A demonstrated significant negative correlation with expression levels of multiple hepatic transcription factors (including NR1I2 and HNF4A) and DMEs such as CYP3A4 and CYP2C19, STAT5B expression was observed to demonstrate positive associations with several CYPs and TFs analyzed. As STAT5A and STAT5B have been shown to be important in regulation of gender differences in CYPs, we also analyzed STAT5A and 5b associations with CYPs and TFs separately in males and females and observed gender dependent differential associations of STATs with several CYPs and TFs. Results from the real time PCR validation largely supported our RNA-seq findings. Conclusions: Using both RNA sequencing and real time PCR, we examined the association of STAT5A and STAT5B mRNA expression with CYP and TF gene expression. While STAT5A demonstrated significant negative correlations with expression levels of multiple hepatic TFs (including NR1I2 and HNF4α) and CYPs (eg. CYP3A4, CYP2C19), STAT5B expression was observed to demonstrate positive association with most of the CYPs/TFs analyzed suggesting that STAT5A and STAT5b have potentially different and distinct roles in regulating expression of hepatic drug response genes. Further studies are needed to elucidate the potential roles of STAT5A and 5b in regulation of CYPs/TFs and the potential implications of these findings.« less
Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong
2018-01-04
Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Puławska, Joanna; Kałużna, Monika; Warabieda, Wojciech; Mikiciński, Artur
2017-11-13
Erwinia amylovora is generally considered to be a homogeneous species in terms of phenotypic and genetic features. However, strains show variation in their virulence, particularly on hosts with different susceptibility to fire blight. We applied the RNA-seq technique to elucidate transcriptome-level changes of the lowly virulent E. amylovora 650 strain during infection of shoots of susceptible (Idared) and resistant (Free Redstar) apple cultivars. The highest number of differentially expressed E. amylovora genes between the two apple genotypes was observed at 24 h after inoculation. Six days after inoculation, only a few bacterial genes were differentially expressed in the susceptible and resistant apple cultivars. The analysis of differentially expressed gene functions showed that generally, higher expression of genes related to stress response and defence against toxic compounds was observed in Free Redstar. Also in this cultivar, higher expression of flagellar genes (FlaI), which are recognized as PAMP (pathogen-associated molecular pattern) by the innate immune systems of plants, was noted. Additionally, several genes that have not yet been proven to play a role in the pathogenic abilities of E. amylovora were found to be differentially expressed in the two apple cultivars. This RNA-seq analysis generated a novel dataset describing the transcriptional response of the lowly virulent strain of E. amylovora in susceptible and resistant apple cultivar. Most genes were regulated in the same way in both apple cultivars, but there were also some cultivar-specific responses suggesting that the environment in Free Redstar is more stressful for bacteria what can be the reason of their inability to infect of this cultivar. Among genes with the highest fold change in expression between experimental combinations or with the highest transcript abundance, there are many genes without ascribed functions, which have never been tested for their role in pathogenicity. Overall, this study provides the first transcriptional profile by RNA-seq of E. amylovora during infection of a host plant and insights into the transcriptional response of this pathogen in the environments of susceptible and resistant apple plants.
Kel, Ivan; Chang, Zisong; Galluccio, Nadia; Romeo, Margherita; Beretta, Stefano; Diomede, Luisa; Mezzelani, Alessandra; Milanesi, Luciano; Dieterich, Christoph; Merelli, Ivan
2016-10-18
The interpretation of genome-wide association study is difficult, as it is hard to understand how polymorphisms can affect gene regulation, in particular for trans-regulatory elements located far from their controlling gene. Using RNA or protein expression data as phenotypes, it is possible to correlate their variations with specific genotypes. This technique is usually referred to as expression Quantitative Trait Loci (eQTLs) analysis and only few packages exist for the integration of genotype patterns and expression profiles. In particular, tools are needed for the analysis of next-generation sequencing (NGS) data on a genome-wide scale, which is essential to identify eQTLs able to control a large number of genes (hotspots). Here we present SPIRE (Software for Polymorphism Identification Regulating Expression), a generic, modular and functionally highly flexible pipeline for eQTL processing. SPIRE integrates different univariate and multivariate approaches for eQTL analysis, paying particular attention to the scalability of the procedure in order to support cis- as well as trans-mapping, thus allowing the identification of hotspots in NGS data. In particular, we demonstrated how SPIRE can handle big association study datasets, reproducing published results and improving the identification of trans-eQTLs. Furthermore, we employed the pipeline to analyse novel data concerning the genotypes of two different C. elegans strains (N2 and Hawaii) and related miRNA expression data, obtained using RNA-Seq. A miRNA regulatory hotspot was identified in chromosome 1, overlapping the transcription factor grh-1, known to be involved in the early phases of embryonic development of C. elegans. In a follow-up qPCR experiment we were able to verify most of the predicted eQTLs, as well as to show, for a novel miRNA, a significant difference in the sequences of the two analysed strains of C. elegans. SPIRE is publicly available as open source software at , together with some example data, a readme file, supplementary material and a short tutorial.
visnormsc: A Graphical User Interface to Normalize Single-cell RNA Sequencing Data.
Tang, Lijun; Zhou, Nan
2017-12-26
Single-cell RNA sequencing (RNA-seq) allows the analysis of gene expression with high resolution. The intrinsic defects of this promising technology imports technical noise into the single-cell RNA-seq data, increasing the difficulty of accurate downstream inference. Normalization is a crucial step in single-cell RNA-seq data pre-processing. SCnorm is an accurate and efficient method that can be used for this purpose. An R implementation of this method is currently available. On one hand, the R package possesses many excellent features from R. On the other hand, R programming ability is required, which prevents the biologists who lack the skills from learning to use it quickly. To make this method more user-friendly, we developed a graphical user interface, visnormsc, for normalization of single-cell RNA-seq data. It is implemented in Python and is freely available at https://github.com/solo7773/visnormsc . Although visnormsc is based on the existing method, it contributes to this field by offering a user-friendly alternative. The out-of-the-box and cross-platform features make visnormsc easy to learn and to use. It is expected to serve biologists by simplifying single-cell RNA-seq normalization.
King, Lauren E; Love, Christopher G; Sieber, Oliver M; Faux, Maree C; Burgess, Antony W
2016-03-01
The adenomatous polyposis coli (APC) tumour suppressor gene is mutated in about 80% of colorectal cancers (CRC) Brannon et al. (2014) [1]. APC is a large multifunctional protein that regulates many biological functions including Wnt signalling (through the regulation of beta-catenin stability) Reya and Clevers (2005) [2], cell migration Kroboth et al. (2007), Sansom et al. (2004) [3], [4], mitosis Kaplan et al. (2001) [5], cell adhesion Faux et al. (2004), Carothers et al. (2001) [6], [7] and differentiation Sansom et al. (2004) [4]. Although the role of APC in CRC is often described as the deregulation of Wnt signalling, its other biological functions suggest that there are other factors at play that contribute to the onset of adenomas and the progression of CRC upon the truncation of APC. To identify genes and pathways that are dysregulated as a consequence of loss of function of APC, we compared the gene expression profiles of the APC mutated human CRC cell line SW480 following reintroduction of wild-type APC (SW480 + APC) or empty control vector (SW480 + vector control) Faux et al. (2004) . Here we describe the RNA-seq data derived for three biological replicates of parental SW480, SW480 + vector control and SW480 + APC cells, and present the bioinformatics pipeline used to test for differential gene expression and pathway enrichment analysis. A total of 1735 genes showed significant differential expression when APC was restored and were enriched for genes associated with cell polarity, Wnt signalling and the epithelial to mesenchymal transition. There was additional enrichment for genes involved in cell-cell adhesion, cell-matrix junctions, angiogenesis, axon morphogenesis and cell movement. The raw and analysed RNA-seq data have been deposited in the Gene Expression Omnibus (GEO) database under accession number GSE76307. This dataset is useful for further investigations of the impact of APC mutation on the properties of colorectal cancer cells.
RISC RNA sequencing for context-specific identification of in vivo miR targets
Matkovich, Scot J; Van Booven, Derek J; Eschenbacher, William H; Dorn, Gerald W
2010-01-01
Rationale MicroRNAs (miRs) are expanding our understanding of cardiac disease and have the potential to transform cardiovascular therapeutics. One miR can target hundreds of individual mRNAs, but existing methodologies are not sufficient to accurately and comprehensively identify these mRNA targets in vivo. Objective To develop methods permitting identification of in vivo miR targets in an unbiased manner, using massively parallel sequencing of mouse cardiac transcriptomes in combination with sequencing of mRNA associated with mouse cardiac RNA-induced silencing complexes (RISCs). Methods and Results We optimized techniques for expression profiling small amounts of RNA without introducing amplification bias, and applied this to anti-Argonaute 2 immunoprecipitated RISCs (RISC-Seq) from mouse hearts. By comparing RNA-sequencing results of cardiac RISC and transcriptome from the same individual hearts, we defined 1,645 mRNAs consistently targeted to mouse cardiac RISCs. We employed this approach in hearts overexpressing miRs from Myh6 promoter-driven precursors (programmed RISC-Seq) to identify 209 in vivo targets of miR-133a and 81 in vivo targets of miR-499. Consistent with the fact that miR-133a and miR-499 have widely differing ‘seed’ sequences and belong to different miR families, only 6 targets were common to miR-133a- and miR-499-programmed hearts. Conclusions RISC-sequencing is a highly sensitive method for general RISC profiling and individual miR target identification in biological context, and is applicable to any tissue and any disease state. Summary MicroRNAs (miRs) are key regulators of mRNA translation in health and disease. While bioinformatic predictions suggest that a single miR may target hundreds of mRNAs, the number of experimentally verified targets of miRs is low. To enable comprehensive, unbiased examination of miR targets, we have performed deep RNA sequencing of cardiac transcriptomes in parallel with cardiac RNA-induced silencing complex (RISC)-associated RNAs (the RISCome), called RISC sequencing. We developed methods that did not require cross-linking of RNAs to RISCs or amplification of mRNA prior to sequencing, making it possible to rapidly perform RISC sequencing from intact tissue while avoiding amplification bias. Comparison of RISCome with transcriptome expression defined the degree of RISC enrichment for each mRNA. The majority of the mRNAs enriched in wild-type cardiac RISComes compared to transcriptomes were bioinformatically predicted to be targets of at least 1 of 139 cardiac-expressed miRs. Programming cardiomyocyte RISCs via transgenic overexpression in adult hearts of miR-133a or miR-499, two miRs that contain entirely different ‘seed’ sequences, elicited differing profiles of RISC-targeted mRNAs. Thus, RISC sequencing represents a highly sensitive method for general RISC profiling and individual miR target identification in biological context. PMID:21030712
Transcriptome Profiling of Human FoxP3+ Regulatory T Cells
Bhairavabhotla, Ravikiran; Kim, Yong C.; Glass, Deborah D.; Escobar, Thelma M.; Patel, Mira C.; Zahr, Rami; Nguyen, Cuong K.; Kilaru, Gokhul K.; Muljo, Stefan A.; Shevach, Ethan M.
2015-01-01
The major goal of this study was to perform an in depth characterization of the “gene signature” of human FoxP3+ T regulatory cells (Tregs). Highly purified Tregs and T conventional cells (Tconvs) from multiple healthy donors (HD), either freshly explanted or activated in vitro, were analyzed via RNA sequencing (RNA-seq) and gene expression changes validated using the nCounter system. Additionally, we analyzed microRNA (miRNA) expression using TaqMan low-density arrays. Our results confirm previous studies demonstrating selective gene expression of FoxP3, IKZF2, and CTLA4 in Tregs. Notably, a number of yet uncharacterized genes (RTKN2, LAYN, UTS2, CSF2RB, TRIB1, F5, CECAM4, CD70, ENC1 and NKG7) were identified and validated as being differentially expressed in human Tregs. We further characterize the functional roles of RTKN2 and LAYN by analyzing their roles in vitro human Treg suppression assays by knocking them down in Tregs and overexpressing them in Tconvs. In order to facilitate a better understanding of the human Treg gene expression signature, we have generated from our results a hypothetical interactome of genes and miRNAs in Tregs and Tconvs, PMID:26686412
Enyeart, Peter J; Mohr, Georg; Ellington, Andrew D; Lambowitz, Alan M
2014-01-13
Mobile group II introns are bacterial retrotransposons that combine the activities of an autocatalytic intron RNA (a ribozyme) and an intron-encoded reverse transcriptase to insert site-specifically into DNA. They recognize DNA target sites largely by base pairing of sequences within the intron RNA and achieve high DNA target specificity by using the ribozyme active site to couple correct base pairing to RNA-catalyzed intron integration. Algorithms have been developed to program the DNA target site specificity of several mobile group II introns, allowing them to be made into 'targetrons.' Targetrons function for gene targeting in a wide variety of bacteria and typically integrate at efficiencies high enough to be screened easily by colony PCR, without the need for selectable markers. Targetrons have found wide application in microbiological research, enabling gene targeting and genetic engineering of bacteria that had been intractable to other methods. Recently, a thermostable targetron has been developed for use in bacterial thermophiles, and new methods have been developed for using targetrons to position recombinase recognition sites, enabling large-scale genome-editing operations, such as deletions, inversions, insertions, and 'cut-and-pastes' (that is, translocation of large DNA segments), in a wide range of bacteria at high efficiency. Using targetrons in eukaryotes presents challenges due to the difficulties of nuclear localization and sub-optimal magnesium concentrations, although supplementation with magnesium can increase integration efficiency, and directed evolution is being employed to overcome these barriers. Finally, spurred by new methods for expressing group II intron reverse transcriptases that yield large amounts of highly active protein, thermostable group II intron reverse transcriptases from bacterial thermophiles are being used as research tools for a variety of applications, including qRT-PCR and next-generation RNA sequencing (RNA-seq). The high processivity and fidelity of group II intron reverse transcriptases along with their novel template-switching activity, which can directly link RNA-seq adaptor sequences to cDNAs during reverse transcription, open new approaches for RNA-seq and the identification and profiling of non-coding RNAs, with potentially wide applications in research and biotechnology.
Tian, Yao; Smith, David Roy
2016-05-01
Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Tao, Xiang; Fang, Yang; Xiao, Yao; Jin, Yan-Ling; Ma, Xin-Rong; Zhao, Yun; He, Kai-Ze; Zhao, Hai; Wang, Hai-Yan
2013-05-08
Duckweed can thrive on anthropogenic wastewater and produce tremendous biomass production. Due to its relatively high starch and low lignin percentage, duckweed is a good candidate for bioethanol fermentation. Previous studies have observed that water devoid of nutrients is good for starch accumulation, but its molecular mechanism remains unrevealed. This study globally analyzed the response to nutrient starvation in order to investigate the starch accumulation in duckweed (Landoltia punctata). L. punctata was transferred from nutrient-rich solution to distilled water and sampled at different time points. Physiological measurements demonstrated that the activity of ADP-glucose pyrophosphorylase, the key enzyme of starch synthesis, as well as the starch percentage in duckweed, increased continuously under nutrient starvation. Samples collected at 0 h, 2 h and 24 h time points respectively were used for comparative gene expression analysis using RNA-Seq. A comprehensive transcriptome, comprising of 74,797 contigs, was constructed by a de novo assembly of the RNA-Seq reads. Gene expression profiling results showed that the expression of some transcripts encoding key enzymes involved in starch biosynthesis was up-regulated, while the expression of transcripts encoding enzymes involved in starch consumption were down-regulated, the expression of some photosynthesis-related transcripts were down-regulated during the first 24 h, and the expression of some transporter transcripts were up-regulated within the first 2 h. Very interestingly, most transcripts encoding key enzymes involved in flavonoid biosynthesis were highly expressed regardless of starvation, while transcripts encoding laccase, the last rate-limiting enzyme of lignifications, exhibited very low expression abundance in all three samples. Our study provides a comprehensive expression profiling of L. punctata under nutrient starvation, which indicates that nutrient starvation down-regulated the global metabolic status, redirects metabolic flux of fixed CO2 into starch synthesis branch resulting in starch accumulation in L. punctata.
2013-01-01
Background Duckweed can thrive on anthropogenic wastewater and produce tremendous biomass production. Due to its relatively high starch and low lignin percentage, duckweed is a good candidate for bioethanol fermentation. Previous studies have observed that water devoid of nutrients is good for starch accumulation, but its molecular mechanism remains unrevealed. Results This study globally analyzed the response to nutrient starvation in order to investigate the starch accumulation in duckweed (Landoltia punctata). L. punctata was transferred from nutrient-rich solution to distilled water and sampled at different time points. Physiological measurements demonstrated that the activity of ADP-glucose pyrophosphorylase, the key enzyme of starch synthesis, as well as the starch percentage in duckweed, increased continuously under nutrient starvation. Samples collected at 0 h, 2 h and 24 h time points respectively were used for comparative gene expression analysis using RNA-Seq. A comprehensive transcriptome, comprising of 74,797 contigs, was constructed by a de novo assembly of the RNA-Seq reads. Gene expression profiling results showed that the expression of some transcripts encoding key enzymes involved in starch biosynthesis was up-regulated, while the expression of transcripts encoding enzymes involved in starch consumption were down-regulated, the expression of some photosynthesis-related transcripts were down-regulated during the first 24 h, and the expression of some transporter transcripts were up-regulated within the first 2 h. Very interestingly, most transcripts encoding key enzymes involved in flavonoid biosynthesis were highly expressed regardless of starvation, while transcripts encoding laccase, the last rate-limiting enzyme of lignifications, exhibited very low expression abundance in all three samples. Conclusion Our study provides a comprehensive expression profiling of L. punctata under nutrient starvation, which indicates that nutrient starvation down-regulated the global metabolic status, redirects metabolic flux of fixed CO2 into starch synthesis branch resulting in starch accumulation in L. punctata. PMID:23651472
NASA Astrophysics Data System (ADS)
Ma, Deyou; Yang, Hongsheng; Sun, Lina
2014-12-01
Sea cucumber ( Apostichopus japonicus) is one of the most important aquaculture animals in China. Usually its normal body color is black that fits its living environment. The juvenile individuals obtained by crossing albino sea cucumber segregated in body color. To document the transcriptome difference between albino associating sea cucumber and the control, we sequenced their transcriptomes with RNA-seq. Approximately, 4.790 million (M) and 4.884 M reads, 200 nt in length, were generated from the body wall of albino associating sea cucumber and the control, respectively, from them, 9550 (46.81%) putative genes were identified. In total, 583 genes were found to express differentially between albino associating sea cucumber and the control. Of these differentially expressed genes (DEGs), 4.8% changed more than five-folds. The expression levels of eight DEGs were confirmed with real-time PCR. The changing trend of these DEGs detected with real-time PCR agreed well with that detected with RNA-seq, although the change degree of some DEGs was different. Four significantly enriched pathways were identified for DEGs, which included phagocytosis, Staphylococcus aureus infection, ECM-receptor interaction and focal adhesion. These pathways were helpful for understanding the physiological difference between albino associating sea cucumber and the control.
O'Hurley, Gillian; Busch, Christer; Fagerberg, Linn; Hallström, Björn M.; Stadler, Charlotte; Tolf, Anna; Lundberg, Emma; Schwenk, Jochen M.; Jirström, Karin; Bjartell, Anders; Gallagher, William M.; Uhlén, Mathias; Pontén, Fredrik
2015-01-01
To better understand prostate function and disease, it is important to define and explore the molecular constituents that signify the prostate gland. The aim of this study was to define the prostate specific transcriptome and proteome, in comparison to 26 other human tissues. Deep sequencing of mRNA (RNA-seq) and immunohistochemistry-based protein profiling were combined to identify prostate specific gene expression patterns and to explore tissue biomarkers for potential clinical use in prostate cancer diagnostics. We identified 203 genes with elevated expression in the prostate, 22 of which showed more than five-fold higher expression levels compared to all other tissue types. In addition to previously well-known proteins we identified two poorly characterized proteins, TMEM79 and ACOXL, with potential to differentiate between benign and cancerous prostatic glands in tissue biopsies. In conclusion, we have applied a genome-wide analysis to identify the prostate specific proteome using transcriptomics and antibody-based protein profiling to identify genes with elevated expression in the prostate. Our data provides a starting point for further functional studies to explore the molecular repertoire of normal and diseased prostate including potential prostate cancer markers such as TMEM79 and ACOXL. PMID:26237329
Choi, Yong Jun; Song, Insun; Jin, Yilan; Jin, Hyun-Seok; Ji, Hyung Min; Jeong, Seon-Yong; Won, Ye-Yeon; Chung, Yoon-Sok
2017-10-20
Genetic alterations are major contributing factors in the development of osteoporosis. Osteoblasts and adipocytes share a common origin, mesenchymal stem cells (MSCs), and their genetic determinants might be important in the relationship between osteoporosis and obesity. In the present study, we aimed to isolate differentially expressed genes (DEGs) in osteoporosis and normal controls using human MSCs, and elucidate the common pathways and genes related to osteoporosis and adipogenesis. Human MSCs were obtained from the bone marrow of femurs from postmenopausal women during orthopedic surgeries. RNA sequencing (RNA-seq) was carried out using next-generation sequencing (NGS) technology. DEGs were identified using RNA-seq data. Ingenuity pathway analysis (IPA) was used to elucidate the common pathway related to osteoporosis and adipogenesis. Candidate genes for the common pathway were validated with other independent osteoporosis and obese subjects using RT-PCR (reverse transcription-polymerase chain reaction) analysis. Fifty-three DEGs were identified between postmenopausal osteoporosis patients and normal bone mineral density (BMD) controls. Most of the genetic changes were related to the differentiation of cells. The nuclear receptor subfamily 4 group A (NR4A) family was identified as possible common genes related to osteogenesis and adipogenesis. The expression level of the mRNA of NR4A1 was significantly higher in osteoporosis patients than in controls (p=0.018). The expression level of the mRNA of NR4A2 was significantly higher in obese patients than in controls (p=0.041). Some genetic changes in MSCs are involved in the pathophysiology of osteoporosis. The NR4A family might comprise common genes related to osteoporosis and obesity. Copyright © 2017 Elsevier B.V. All rights reserved.
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
RNA-Seq of Arabidopsis Pollen Uncovers Novel Transcription and Alternative Splicing1[C][W][OA
Loraine, Ann E.; McCormick, Sheila; Estrada, April; Patel, Ketan; Qin, Peng
2013-01-01
Pollen grains of Arabidopsis (Arabidopsis thaliana) contain two haploid sperm cells enclosed in a haploid vegetative cell. Upon germination, the vegetative cell extrudes a pollen tube that carries the sperm to an ovule for fertilization. Knowing the identity, relative abundance, and splicing patterns of pollen transcripts will improve our understanding of pollen and allow investigation of tissue-specific splicing in plants. Most Arabidopsis pollen transcriptome studies have used the ATH1 microarray, which does not assay splice variants and lacks specific probe sets for many genes. To investigate the pollen transcriptome, we performed high-throughput sequencing (RNA-Seq) of Arabidopsis pollen and seedlings for comparison. Gene expression was more diverse in seedling, and genes involved in cell wall biogenesis were highly expressed in pollen. RNA-Seq detected at least 4,172 protein-coding genes expressed in pollen, including 289 assayed only by nonspecific probe sets. Additional exons and previously unannotated 5′ and 3′ untranslated regions for pollen-expressed genes were revealed. We detected regions in the genome not previously annotated as expressed; 14 were tested and 12 were confirmed by polymerase chain reaction. Gapped read alignments revealed 1,908 high-confidence new splicing events supported by 10 or more spliced read alignments. Alternative splicing patterns in pollen and seedling were highly correlated. For most alternatively spliced genes, the ratio of variants in pollen and seedling was similar, except for some encoding proteins involved in RNA splicing. This study highlights the robustness of splicing patterns in plants and the importance of ongoing annotation and visualization of RNA-Seq data using interactive tools such as Integrated Genome Browser. PMID:23590974
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.
Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang
2015-01-01
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat
2016-12-22
The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.
Identification of the miRNA-mRNA regulatory network of small cell osteosarcoma based on RNA-seq.
Xie, Lin; Liao, Yedan; Shen, Lida; Hu, Fengdi; Yu, Sunlin; Zhou, Yonghong; Zhang, Ya; Yang, Yihao; Li, Dongqi; Ren, Minyan; Yuan, Zhongqin; Yang, Zuozhang
2017-06-27
Small cell osteosarcoma (SCO) is a rare subtype of osteosarcoma characterized by highly aggressive progression and a poor prognosis. The miRNA and mRNA expression profiles of peripheral blood mononuclear cells (PBMCs) were obtained in 3 patients with SCO and 10 healthy individuals using high-throughput RNA-sequencing. We identified 37 dysregulated miRNAs and 1636 dysregulated mRNAs in patients with SCO compared to the healthy controls. Specifically, the 37 dysregulated miRNAs consisted of 27 up-regulated miRNAs and 10 down-regulated miRNAs; the 1636 dysregulated mRNAs consisted of 555 up-regulated mRNAs and 1081 down-regulated mRNAs. The target-genes of miRNAs were predicted, and 1334 negative correlations between miRNAs and mRNAs were used to construct an miRNA-mRNA regulatory network. Dysregulated genes were significantly enriched in pathways related to cancer, mTOR signaling and cell cycle signaling. Specifically, hsa-miR-26b-5p, hsa-miR-221-3p and hsa-miR-125b-2-3p were significantly dysregulated miRNAs and exhibited a high degree of connectivity with target genes. Overall, the expression of dysregulated genes in tumor tissues and peripheral blood samples of patients with SCO measured by quantitative real-time polymerase chain reaction corroborated with our bioinformatics analyses based on the expression profiles of PBMCs from patients with SCO. Thus, hsa-miR-26b-5p, hsa-miR-221-3p and hsa-miR-125b-2-3p may be involved in SCO tumorigenesis.
Kang, Eun Yong; Martin, Lisa J.; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J.; Shifman, Sagiv; Eskin, Eleazar
2016-01-01
The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here, we increased the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We designed a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-sequencing (RNA-seq) data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. A total of 2309 SNPs were identified as being associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for a regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases. PMID:27765809
2010-01-01
Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909
Tong, Ann-Jay; Kollmann, Tobias R.; Smale, Stephen T.
2015-01-01
A variety of age-related differences in the innate and adaptive immune systems have been proposed to contribute to the increased susceptibility to infection of human neonates and older adults. The emergence of RNA sequencing (RNA-seq) provides an opportunity to obtain an unbiased, comprehensive, and quantitative view of gene expression differences in defined cell types from different age groups. An examination of ex vivo human monocyte responses to lipopolysaccharide stimulation or Listeria monocytogenes infection by RNA-seq revealed extensive similarities between neonates, young adults, and older adults, with an unexpectedly small number of genes exhibiting statistically significant age-dependent differences. By examining the differentially induced genes in the context of transcription factor binding motifs and RNA-seq data sets from mutant mouse strains, a previously described deficiency in interferon response factor-3 activity could be implicated in most of the differences between newborns and young adults. Contrary to these observations, older adults exhibited elevated expression of inflammatory genes at baseline, yet the responses following stimulation correlated more closely with those observed in younger adults. Notably, major differences in the expression of constitutively expressed genes were not observed, suggesting that the age-related differences are driven by environmental influences rather than cell-autonomous differences in monocyte development. PMID:26147648
Serin, Elise A. R.; Snoek, L. B.; Nijveen, Harm; Willems, Leo A. J.; Jiménez-Gómez, Jose M.; Hilhorst, Henk W. M.; Ligterink, Wilco
2017-01-01
High-density genetic maps are essential for high resolution mapping of quantitative traits. Here, we present a new genetic map for an Arabidopsis Bayreuth × Shahdara recombinant inbred line (RIL) population, built on RNA-seq data. RNA-seq analysis on 160 RILs of this population identified 30,049 single-nucleotide polymorphisms (SNPs) covering the whole genome. Based on a 100-kbp window SNP binning method, 1059 bin-markers were identified, physically anchored on the genome. The total length of the RNA-seq genetic map spans 471.70 centimorgans (cM) with an average marker distance of 0.45 cM and a maximum marker distance of 4.81 cM. This high resolution genotyping revealed new recombination breakpoints in the population. To highlight the advantages of such high-density map, we compared it to two publicly available genetic maps for the same population, comprising 69 PCR-based markers and 497 gene expression markers derived from microarray data, respectively. In this study, we show that SNP markers can effectively be derived from RNA-seq data. The new RNA-seq map closes many existing gaps in marker coverage, saturating the previously available genetic maps. Quantitative trait locus (QTL) analysis for published phenotypes using the available genetic maps showed increased QTL mapping resolution and reduced QTL confidence interval using the RNA-seq map. The new high-density map is a valuable resource that facilitates the identification of candidate genes and map-based cloning approaches. PMID:29259624
Mattison, Christopher P; Rai, Ruhi; Settlage, Robert E; Hinchliffe, Doug J; Madison, Crista; Bland, John M; Brashear, Suzanne; Graham, Charles J; Tarver, Matthew R; Florane, Christopher; Bechtel, Peter J
2017-02-22
The pecan nut is a nutrient-rich part of a healthy diet full of beneficial fatty acids and antioxidants, but can also cause allergic reactions in people suffering from food allergy to the nuts. The transcriptome of a developing pecan nut was characterized to identify the gene expression occurring during the process of nut development and to highlight those genes involved in fatty acid metabolism and those that commonly act as food allergens. Pecan samples were collected at several time points during the embryo development process including the water, gel, dough, and mature nut stages. Library preparation and sequencing were performed using Illumina-based mRNA HiSeq with RNA from four time points during the growing season during August and September 2012. Sequence analysis with Trinotate software following the Trinity protocol identified 133,000 unigenes with 52,267 named transcripts and 45,882 annotated genes. A total of 27,312 genes were defined by GO annotation. Gene expression clustering analysis identified 12 different gene expression profiles, each containing a number of genes. Three pecan seed storage proteins that commonly act as allergens, Car i 1, Car i 2, and Car i 4, were significantly up-regulated during the time course. Up-regulated fatty acid metabolism genes that were identified included acyl-[ACP] desaturase and omega-6 desaturase genes involved in oleic and linoleic acid metabolism. Notably, a few of the up-regulated acyl-[ACP] desaturase and omega-6 desaturase genes that were identified have expression patterns similar to the allergen genes based upon gene expression clustering and qPCR analysis. These findings suggest the possibility of coordinated accumulation of lipids and allergens during pecan nut embryogenesis.
HALO--a Java framework for precise transcript half-life determination.
Friedel, Caroline C; Kaufmann, Stefanie; Dölken, Lars; Zimmer, Ralf
2010-05-01
Recent improvements in experimental technologies now allow measurements of de novo transcription and/or RNA decay at whole transcriptome level and determination of precise transcript half-lives. Such transcript half-lives provide important insights into the regulation of biological processes and the relative contributions of RNA decay and de novo transcription to differential gene expression. In this article, we present HALO (Half-life Organizer), the first software for the precise determination of transcript half-lives from measurements of RNA de novo transcription or decay determined with microarrays or RNA-seq. In addition, methods for quality control, filtering and normalization are supplied. HALO provides a graphical user interface, command-line tools and a well-documented Java application programming interface (API). Thus, it can be used both by biologists to determine transcript half-lives fast and reliably with the provided user interfaces as well as software developers integrating transcript half-life analysis into other gene expression profiling pipelines. Source code, executables and documentation are available at http://www.bio.ifi.lmu.de/software/halo.
Li, You; Heavican, Tayla B.; Vellichirammal, Neetha N.; Iqbal, Javeed
2017-01-01
Abstract The RNA-Seq technology has revolutionized transcriptome characterization not only by accurately quantifying gene expression, but also by the identification of novel transcripts like chimeric fusion transcripts. The ‘fusion’ or ‘chimeric’ transcripts have improved the diagnosis and prognosis of several tumors, and have led to the development of novel therapeutic regimen. The fusion transcript detection is currently accomplished by several software packages, primarily relying on sequence alignment algorithms. The alignment of sequencing reads from fusion transcript loci in cancer genomes can be highly challenging due to the incorrect mapping induced by genomic alterations, thereby limiting the performance of alignment-based fusion transcript detection methods. Here, we developed a novel alignment-free method, ChimeRScope that accurately predicts fusion transcripts based on the gene fingerprint (as k-mers) profiles of the RNA-Seq paired-end reads. Results on published datasets and in-house cancer cell line datasets followed by experimental validations demonstrate that ChimeRScope consistently outperforms other popular methods irrespective of the read lengths and sequencing depth. More importantly, results on our in-house datasets show that ChimeRScope is a better tool that is capable of identifying novel fusion transcripts with potential oncogenic functions. ChimeRScope is accessible as a standalone software at (https://github.com/ChimeRScope/ChimeRScope/wiki) or via the Galaxy web-interface at (https://galaxy.unmc.edu/). PMID:28472320
The fractured landscape of RNA-seq alignment: the default in our STARs.
Ballouz, Sara; Dobin, Alexander; Gingeras, Thomas R; Gillis, Jesse
2018-06-01
Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.
Zhao, Dejian; Lin, Mingyan; Chen, Jian; Pedrosa, Erika; Hrabovsky, Anastasia; Fourcade, H. Matthew; Zheng, Deyou; Lachman, Herbert M.
2015-01-01
We are using induced pluripotent stem cell (iPSC) technology to study neuropsychiatric disorders associated with 22q11.2 microdeletions (del), the most common known schizophrenia (SZ)-associated genetic factor. Several genes in the region have been implicated; a promising candidate is DGCR8, which codes for a protein involved in microRNA (miRNA) biogenesis. We carried out miRNA expression profiling (miRNA-seq) on neurons generated from iPSCs derived from controls and SZ patients with 22q11.2 del. Using thresholds of p<0.01 for nominal significance and 1.5-fold differences in expression, 45 differentially expressed miRNAs were detected (13 lower in SZ and 32 higher). Of these, 6 were significantly down-regulated in patients after correcting for genome wide significance (FDR<0.05), including 4 miRNAs that map to the 22q11.2 del region. In addition, a nominally significant increase in the expression of several miRNAs was found in the 22q11.2 neurons that were previously found to be differentially expressed in autopsy samples and peripheral blood in SZ and autism spectrum disorders (e.g., miR-34, miR-4449, miR-146b-3p, and miR-23a-5p). Pathway and function analysis of predicted mRNA targets of the differentially expressed miRNAs showed enrichment for genes involved in neurological disease and psychological disorders for both up and down regulated miRNAs. Our findings suggest that: i. neurons with 22q11.2 del recapitulate the miRNA expression patterns expected of 22q11.2 haploinsufficiency, ii. differentially expressed miRNAs previously identified using autopsy samples and peripheral cells, both of which have significant methodological problems, are indeed disrupted in neuropsychiatric disorders and likely have an underlying genetic basis. PMID:26173148
Inertial-ordering-assisted droplet microfluidics for high-throughput single-cell RNA-sequencing.
Moon, Hui-Sung; Je, Kwanghwi; Min, Jae-Woong; Park, Donghyun; Han, Kyung-Yeon; Shin, Seung-Ho; Park, Woong-Yang; Yoo, Chang Eun; Kim, Shin-Hyun
2018-02-27
Single-cell RNA-seq reveals the cellular heterogeneity inherent in the population of cells, which is very important in many clinical and research applications. Recent advances in droplet microfluidics have achieved the automatic isolation, lysis, and labeling of single cells in droplet compartments without complex instrumentation. However, barcoding errors occurring in the cell encapsulation process because of the multiple-beads-in-droplet and insufficient throughput because of the low concentration of beads for avoiding multiple-beads-in-a-droplet remain important challenges for precise and efficient expression profiling of single cells. In this study, we developed a new droplet-based microfluidic platform that significantly improved the throughput while reducing barcoding errors through deterministic encapsulation of inertially ordered beads. Highly concentrated beads containing oligonucleotide barcodes were spontaneously ordered in a spiral channel by an inertial effect, which were in turn encapsulated in droplets one-by-one, while cells were simultaneously encapsulated in the droplets. The deterministic encapsulation of beads resulted in a high fraction of single-bead-in-a-droplet and rare multiple-beads-in-a-droplet although the bead concentration increased to 1000 μl -1 , which diminished barcoding errors and enabled accurate high-throughput barcoding. We successfully validated our device with single-cell RNA-seq. In addition, we found that multiple-beads-in-a-droplet, generated using a normal Drop-Seq device with a high concentration of beads, underestimated transcript numbers and overestimated cell numbers. This accurate high-throughput platform can expand the capability and practicality of Drop-Seq in single-cell analysis.
Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis
Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha; ...
2015-03-16
Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less
Transcriptome Profiling of Rust Resistance in Switchgrass Using RNA-Seq Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Serba, Desalegn D.; Uppalapati, Srinivasa Rao; Mukherjee, Shreyartha
Switchgrass rust caused by Puccinia emaculata is a major limiting factor for switchgrass (Panicum virgatum L.) production, especially in monoculture. Natural populations of switchgrass displayed diverse reactions to P. emaculata when evaluated in an Ardmore, OK, field. In order to identify the differentially expressed genes during the rust infection process and the mechanisms of switchgrass rust resistance, transcriptome analysis using RNA-Seq was conducted in two pseudo-F 1 parents ('PV281' and 'NFGA472'), and three moderately resistant and three susceptible progenies selected from a three-generation, four-founder switchgrass population (K5 x A4) x (AP13 x VS16). On average, 23.5 million reads per samplemore » (leaf tissue was collected at 0, 24, and 60 h post-inoculation (hpi)) were obtained from paired-end (2 x 100 bp) sequencing on the Illumina HiSeq2000 platform. Furthermore, mapping of the RNA-Seq reads to the switchgrass reference genome (AP13 ver. 1.1 assembly) constructed a total of 84,209 transcripts from 98,007 gene loci among all of the samples. Further analysis revealed that host defense- related genes, including the nucleotide binding site-leucinerich repeat domain containing disease resistance gene analogs, play an important role in resistance to rust infection. Rust-induced gene (RIG) transcripts inherited across generations were identified. The rust-resistant gene transcripts can be a valuable resource for developing molecular markers for rust resistance. Finally we identified the rust-resistant genotypes and gene transcripts which can expedite rust-resistant cultivar development in switchgrass.« less
Comparative transcript profiling of the fertile and sterile flower buds of pol CMS in B. napus.
An, Hong; Yang, Zonghui; Yi, Bin; Wen, Jing; Shen, Jinxiong; Tu, Jinxing; Ma, Chaozhi; Fu, Tingdong
2014-04-03
The Polima (pol) system of cytoplasmic male sterility (CMS) and its fertility restoration gene Rfp have been used in hybrid breeding in Brassica napus, which has greatly improved the yield of rapeseed. However, the mechanism of the male sterility transition in pol CMS remains to be determined. To investigate the transcriptome during the male sterility transition in pol CMS, a near-isogenic line (NIL) of pol CMS was constructed. The phenotypic features and sterility stage were confirmed by anatomical analysis. Subsequently, we compared the genomic expression profiles of fertile and sterile young flower buds by RNA-Seq. A total of 105,481,136 sequences were successfully obtained. These reads were assembled into 112,770 unigenes, which composed the transcriptome of the bud. Among these unigenes, 72,408 (64.21%) were annotated using public protein databases and classified into functional clusters. In addition, we investigated the changes in expression of the fertile and sterile buds; the RNA-seq data showed 1,148 unigenes had significantly different expression and they were mainly distributed in metabolic and protein synthesis pathways. Additionally, some unigenes controlling anther development were dramatically down-regulated in sterile buds. These results suggested that an energy deficiency caused by orf224/atp6 may inhibit a series of genes that regulate pollen development through nuclear-mitochondrial interaction. This results in the sterility of pol CMS by leading to the failure of sporogenous cell differentiation. This study may provide assistance for detailed molecular analysis and a better understanding of pol CMS in B. napus.
Anjanappa, Ravi B; Mehta, Devang; Okoniewski, Michal J; Szabelska-Berȩsewicz, Alicja; Gruissem, Wilhelm; Vanderschuren, Hervé
2018-02-01
Cassava brown streak virus (CBSV) and Ugandan cassava brown streak virus (UCBSV) are responsible for significant cassava yield losses in eastern sub-Saharan Africa. To study the possible mechanisms of plant resistance to CBSVs, we inoculated CBSV-susceptible and CBSV-resistant cassava varieties with a mixed infection of CBSVs using top-cleft grafting. Transcriptome profiling of the two cassava varieties was performed at the earliest time point of full infection (28 days after grafting) in the susceptible scions. The expression of genes encoding proteins in RNA silencing, salicylic acid pathways and callose deposition was altered in the susceptible cassava variety, but transcriptional changes were limited in the resistant variety. In total, the expression of 585 genes was altered in the resistant variety and 1292 in the susceptible variety. Transcriptional changes led to the activation of β-1,3-glucanase enzymatic activity and a reduction in callose deposition in the susceptible cassava variety. Time course analysis also showed that CBSV replication in susceptible cassava induced a strong up-regulation of RDR1, a gene previously reported to be a susceptibility factor in other potyvirus-host pathosystems. The differences in the transcriptional responses to CBSV infection indicated that susceptibility involves the restriction of callose deposition at plasmodesmata. Aniline blue staining of callose deposits also indicated that the resistant variety displays a moderate, but significant, increase in callose deposition at the plasmodesmata. Transcriptome data suggested that resistance does not involve typical antiviral defence responses (i.e. RNA silencing and salicylic acid). A meta-analysis of the current RNA-sequencing (RNA-seq) dataset and selected potyvirus-host and virus-cassava RNA-seq datasets revealed that the conservation of the host response across pathosystems is restricted to genes involved in developmental processes. © 2017 THE AUTHORS. MOLECULAR PLANT PATHOLOGY PUBLISHED BY BRITISH SOCIETY FOR PLANT PATHOLOGY AND JOHN WILEY & SONS LTD.
Xoca-Orozco, Luis-Ángel; Cuellar-Torres, Esther Angélica; González-Morales, Sandra; Gutiérrez-Martínez, Porfirio; López-García, Ulises; Herrera-Estrella, Luis; Vega-Arreguín, Julio; Chacón-López, Alejandra
2017-01-01
Avocado ( Persea americana ) is one of the most important crops in Mexico as it is the main producer, consumer, and exporter of avocado fruit in the world. However, successful avocado commercialization is often reduced by large postharvest losses due to Colletotrichum sp., the causal agent of anthracnose. Chitosan is known to have a direct antifungal effect and acts also as an elicitor capable of stimulating a defense response in plants. However, there is little information regarding the genes that are either activated or repressed in fruits treated with chitosan. The aim of this study was to identify by RNA-seq the genes differentially regulated by the action of low molecular weight chitosan in the avocado-chitosan- Colletotrichum interaction system. The samples for RNA-seq were obtained from fruits treated with chitosan, fruits inoculated with Colletotrichum and fruits both treated with chitosan and inoculated with the fungus. Non-treated and non-inoculated fruits were also analyzed. Expression profiles showed that in short times, the fruit-chitosan system presented a greater number of differentially expressed genes, compared to the fruit-pathogen system. Gene Ontology analysis of differentially expressed genes showed a large number of metabolic processes regulated by chitosan, including those preventing the spread of Colletotrichum . It was also found that there is a high correlation between the expression of genes in silico and qPCR of several genes involved in different metabolic pathways.
Bogaert, Kenny A; Manoharan-Basil, Sheeba S; Perez, Emilie; Levine, Raphael D; Remacle, Francoise; Remacle, Claire
2018-01-01
The usual cultivation mode of the green microalga Chlamydomonas is liquid medium and light. However, the microalga can also be grown on agar plates and in darkness. Our aim is to analyze and compare gene expression of cells cultivated in these different conditions. For that purpose, RNA-seq data are obtained from Chlamydomonas samples of two different labs grown in four environmental conditions (agar@light, agar@dark, liquid@light, liquid@dark). The RNA seq data are analyzed by surprisal analysis, which allows the simultaneous meta-analysis of all the samples. First we identify a balance state, which defines a state where the expression levels are similar in all the samples irrespectively of their growth conditions, or lab origin. In addition our analysis identifies additional constraints needed to quantify the deviation with respect to the balance state. The first constraint differentiates the agar samples versus the liquid ones; the second constraint the dark samples versus the light ones. The two constraints are almost of equal importance. Pathways involved in stress responses are found in the agar phenotype while the liquid phenotype comprises ATP and NADH production pathways. Remodeling of membrane is suggested in the dark phenotype while photosynthetic pathways characterize the light phenotype. The same trends are also present when performing purely statistical analysis such as K-means clustering and differentially expressed genes.
Livny, Jonathan; Zhou, Xiaohui; Mandlik, Anjali; Hubbard, Troy; Davis, Brigid M.; Waldor, Matthew K.
2014-01-01
Vibrio parahaemolyticus is the leading worldwide cause of seafood-associated gastroenteritis, yet little is known regarding its intraintestinal gene expression or physiology. To date, in vivo analyses have focused on identification and characterization of virulence factors—e.g. a crucial Type III secretion system (T3SS2)—rather than genome-wide analyses of in vivo biology. Here, we used RNA-Seq to profile V. parahaemolyticus gene expression in infected infant rabbits, which mimic human infection. Comparative transcriptomic analysis of V. parahaemolyticus isolated from rabbit intestines and from several laboratory conditions enabled identification of mRNAs and sRNAs induced during infection and of regulatory factors that likely control them. More than 12% of annotated V. parahaemolyticus genes are differentially expressed in the intestine, including the genes of T3SS2, which are likely induced by bile-mediated activation of the transcription factor VtrB. Our analyses also suggest that V. parahaemolyticus has access to glucose or other preferred carbon sources in vivo, but that iron is inconsistently available. The V. parahaemolyticus transcriptional response to in vivo growth is far more widespread than and largely distinct from that of V. cholerae, likely due to the distinct ways in which these diarrheal pathogens interact with and modulate the environment in the small intestine. PMID:25262354
NASA Astrophysics Data System (ADS)
Kumar, Ajay; Chawla, Vandna; Sharma, Eshita; Mahajan, Pallavi; Shankar, Ravi; Yadav, Sudesh Kumar
2016-11-01
Tea quality and yield is influenced by various factors including developmental tissue, seasonal variation and cultivar type. Here, the molecular basis of these factors was investigated in three tea cultivars namely, Him Sphurti (H), TV23 (T), and UPASI-9 (U) using RNA-seq. Seasonal variation in these cultivars was studied during active (A), mid-dormant (MD), dormant (D) and mid-active (MA) stages in two developmental tissues viz. young and old leaf. Development appears to affect gene expression more than the seasonal variation and cultivar types. Further, detailed transcript and metabolite profiling has identified genes such as F3‧H, F3‧5‧H, FLS, DFR, LAR, ANR and ANS of catechin biosynthesis, while MXMT, SAMS, TCS and XDH of caffeine biosynthesis/catabolism as key regulators during development and seasonal variation among three different tea cultivars. In addition, expression analysis of genes related to phytohormones such as ABA, GA, ethylene and auxin has suggested their role in developmental tissues during seasonal variation in tea cultivars. Moreover, differential expression of genes involved in histone and DNA modification further suggests role of epigenetic mechanism in coordinating global gene expression during developmental and seasonal variation in tea. Our findings provide insights into global transcriptional reprogramming associated with development and seasonal variation in tea.
Kumar, Ajay; Chawla, Vandna; Sharma, Eshita; Mahajan, Pallavi; Shankar, Ravi; Yadav, Sudesh Kumar
2016-11-17
Tea quality and yield is influenced by various factors including developmental tissue, seasonal variation and cultivar type. Here, the molecular basis of these factors was investigated in three tea cultivars namely, Him Sphurti (H), TV23 (T), and UPASI-9 (U) using RNA-seq. Seasonal variation in these cultivars was studied during active (A), mid-dormant (MD), dormant (D) and mid-active (MA) stages in two developmental tissues viz. young and old leaf. Development appears to affect gene expression more than the seasonal variation and cultivar types. Further, detailed transcript and metabolite profiling has identified genes such as F3'H, F3'5'H, FLS, DFR, LAR, ANR and ANS of catechin biosynthesis, while MXMT, SAMS, TCS and XDH of caffeine biosynthesis/catabolism as key regulators during development and seasonal variation among three different tea cultivars. In addition, expression analysis of genes related to phytohormones such as ABA, GA, ethylene and auxin has suggested their role in developmental tissues during seasonal variation in tea cultivars. Moreover, differential expression of genes involved in histone and DNA modification further suggests role of epigenetic mechanism in coordinating global gene expression during developmental and seasonal variation in tea. Our findings provide insights into global transcriptional reprogramming associated with development and seasonal variation in tea.
Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G
2012-01-01
Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL. PMID:22829256
Isolation of ripening-related genes from ethylene/1-MCP treated papaya through RNA-seq.
Shen, Yan Hong; Lu, Bing Guo; Feng, Li; Yang, Fei Ying; Geng, Jiao Jiao; Ming, Ray; Chen, Xiao Jing
2017-08-31
Since papaya is a typical climacteric fruit, exogenous ethylene (ETH) applications can induce premature and quicker ripening, while 1-methylcyclopropene (1-MCP) slows down the ripening processes. Differential gene expression in ETH or 1-MCP-treated papaya fruits accounts for the ripening processes. To isolate the key ripening-related genes and better understand fruit ripening mechanisms, transcriptomes of ETH or 1-MCP-treated, and non-treated (Control Group, CG) papaya fruits were sequenced using Illumina Hiseq2500. A total of 18,648 (1-MCP), 19,093 (CG), and 15,321 (ETH) genes were detected, with the genes detected in the ETH-treatment being the least. This suggests that ETH may inhibit the expression of some genes. Based on the differential gene expression (DGE) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, 53 fruit ripening-related genes were selected: 20 cell wall-related genes, 18 chlorophyll and carotenoid metabolism-related genes, four proteinases and their inhibitors, six plant hormone signal transduction pathway genes, four transcription factors, and one senescence-associated gene. Reverse transcription quantitative PCR (RT-qPCR) analyses confirmed the results of RNA-seq and verified that the expression pattern of six genes is consistent with the fruit senescence process. Based on the expression profiling of genes in carbohydrate metabolic process, chlorophyll metabolism pathway, and carotenoid metabolism pathway, the mechanism of pulp softening and coloration of papaya was deduced and discussed. We illustrate that papaya fruit softening is a complex process with significant cell wall hydrolases, such as pectinases, cellulases, and hemicellulases involved in the process. Exogenous ethylene accelerates the coloration of papaya changing from green to yellow. This is likely due to the inhibition of chlorophyll biosynthesis and the α-branch of carotenoid metabolism. Chy-b may play an important role in the yellow color of papaya fruit. Comparing the differential gene expression in ETH/1-MCP-treated papaya using RNA-seq is a sound approach to isolate ripening-related genes. The results of this study can improve our understanding of papaya fruit ripening molecular mechanism and reveal candidate fruit ripening-related genes for further research.
A normalization strategy for comparing tag count data
2012-01-01
Background High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. Results We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. Conclusion Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. PMID:22475125
Zhang, Yanju; Lameijer, Eric-Wubbo; 't Hoen, Peter A C; Ning, Zemin; Slagboom, P Eline; Ye, Kai
2012-02-15
RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon-exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ≈ 137,000 and 173,000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion.
Lu, Yuan; Starkey, Nicholas; Lei, Wei; Li, Jilong; Cheng, Jianlin; Folk, William R.; Lubahn, Dennis B.
2015-01-01
Sutherlandia frutescens (L) R. Br. (Sutherlandia) is a South African botanical that is traditionally used to treat a variety of health conditions, infections and diseases, including cancer. We hypothesized Sutherlandia might act through Gli/ Hedgehog (Hh)-signaling in prostate cancer cells and used RNA-Seq transcription profiling to profile gene expression in TRAMPC2 murine prostate cancer cells with or without Sutherlandia extracts. We found 50% of Hh-responsive genes can be repressed by Sutherlandia ethanol extract, including the canonical Hh-responsive genes Gli1 and Ptch1 as well as newly distinguished Hh-responsive genes Hsd11b1 and Penk. PMID:26710108
Proteogenomic database construction driven from large scale RNA-seq data.
Woo, Sunghee; Cha, Seong Won; Merrihew, Gennifer; He, Yupeng; Castellana, Natalie; Guest, Clark; MacCoss, Michael; Bafna, Vineet
2014-01-03
The advent of inexpensive RNA-seq technologies and other deep sequencing technologies for RNA has the promise to radically improve genomic annotation, providing information on transcribed regions and splicing events in a variety of cellular conditions. Using MS-based proteogenomics, many of these events can be confirmed directly at the protein level. However, the integration of large amounts of redundant RNA-seq data and mass spectrometry data poses a challenging problem. Our paper addresses this by construction of a compact database that contains all useful information expressed in RNA-seq reads. Applying our method to cumulative C. elegans data reduced 496.2 GB of aligned RNA-seq SAM files to 410 MB of splice graph database written in FASTA format. This corresponds to 1000× compression of data size, without loss of sensitivity. We performed a proteogenomics study using the custom data set, using a completely automated pipeline, and identified a total of 4044 novel events, including 215 novel genes, 808 novel exons, 12 alternative splicings, 618 gene-boundary corrections, 245 exon-boundary changes, 938 frame shifts, 1166 reverse strands, and 42 translated UTRs. Our results highlight the usefulness of transcript + proteomic integration for improved genome annotations.
TEcandidates: Prediction of genomic origin of expressed Transposable Elements using RNA-seq data.
Valdebenito-Maturana, Braulio; Riadi, Gonzalo
2018-06-01
In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multimapping reads coming from their repetitive sequences. Current approaches that address multimapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell. We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file, and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed, and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis. The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates. griadi@utalca.cl. Supplementary data are available at Bioinformatics online.
Spatial transcriptomic analysis of cryosectioned tissue samples with Geo-seq.
Chen, Jun; Suo, Shengbao; Tam, Patrick Pl; Han, Jing-Dong J; Peng, Guangdun; Jing, Naihe
2017-03-01
Conventional gene expression studies analyze multiple cells simultaneously or single cells, for which the exact in vivo or in situ position is unknown. Although cellular heterogeneity can be discerned when analyzing single cells, any spatially defined attributes that underpin the heterogeneous nature of the cells cannot be identified. Here, we describe how to use Geo-seq, a method that combines laser capture microdissection (LCM) and single-cell RNA-seq technology. The combination of these two methods enables the elucidation of cellular heterogeneity and spatial variance simultaneously. The Geo-seq protocol allows the profiling of transcriptome information from only a small number cells and retains their native spatial information. This protocol has wide potential applications to address biological and pathological questions of cellular properties such as prospective cell fates, biological function and the gene regulatory network. Geo-seq has been applied to investigate the spatial transcriptome of mouse early embryo, mouse brain, and pathological liver and sperm tissues. The entire protocol from tissue collection and microdissection to sequencing requires ∼5 d, Data analysis takes another 1 or 2 weeks, depending on the amount of data and the speed of the processor.
Ma, Chuang; Wang, Xiangfeng
2012-09-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey's biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses.
Ma, Chuang; Wang, Xiangfeng
2012-01-01
One of the computational challenges in plant systems biology is to accurately infer transcriptional regulation relationships based on correlation analyses of gene expression patterns. Despite several correlation methods that are applied in biology to analyze microarray data, concerns regarding the compatibility of these methods with the gene expression data profiled by high-throughput RNA transcriptome sequencing (RNA-Seq) technology have been raised. These concerns are mainly due to the fact that the distribution of read counts in RNA-Seq experiments is different from that of fluorescence intensities in microarray experiments. Therefore, a comprehensive evaluation of the existing correlation methods and, if necessary, introduction of novel methods into biology is appropriate. In this study, we compared four existing correlation methods used in microarray analysis and one novel method called the Gini correlation coefficient on previously published microarray-based and sequencing-based gene expression data in Arabidopsis (Arabidopsis thaliana) and maize (Zea mays). The comparisons were performed on more than 11,000 regulatory relationships in Arabidopsis, including 8,929 pairs of transcription factors and target genes. Our analyses pinpointed the strengths and weaknesses of each method and indicated that the Gini correlation can compensate for the shortcomings of the Pearson correlation, the Spearman correlation, the Kendall correlation, and the Tukey’s biweight correlation. The Gini correlation method, with the other four evaluated methods in this study, was implemented as an R package named rsgcc that can be utilized as an alternative option for biologists to perform clustering analyses of gene expression patterns or transcriptional network analyses. PMID:22797655
Han, Kook; Tjaden, Brian; Lory, Stephen
2016-12-22
The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique, referred to as global small non-coding RNA target identification by ligation and sequencing (GRIL-seq), is based on preferential ligation of sRNAs to the ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimaeras. In addition to the RNA chaperone Hfq, the GRIL-seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrated that direct regulatory targets of this sRNA can readily be identified. Therefore, GRIL-seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but also for uncovering novel roles for sRNAs and their targets in complex regulatory networks.
Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv
2010-01-01
RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
Handa, Yoshihiro; Nishide, Hiroyo; Takeda, Naoya; Suzuki, Yutaka; Kawaguchi, Masayoshi; Saito, Katsuharu
2015-08-01
Gene expression during arbuscular mycorrhizal development is highly orchestrated in both plants and arbuscular mycorrhizal fungi. To elucidate the gene expression profiles of the symbiotic association, we performed a digital gene expression analysis of Lotus japonicus and Rhizophagus irregularis using a HiSeq 2000 next-generation sequencer with a Cufflinks assembly and de novo transcriptome assembly. There were 3,641 genes differentially expressed during arbuscular mycorrhizal development in L. japonicus, approximately 80% of which were up-regulated. The up-regulated genes included secreted proteins, transporters, proteins involved in lipid and amino acid metabolism, ribosomes and histones. We also detected many genes that were differentially expressed in small-secreted peptides and transcription factors, which may be involved in signal transduction or transcription regulation during symbiosis. Co-regulated genes between arbuscular mycorrhizal and root nodule symbiosis were not particularly abundant, but transcripts encoding for membrane traffic-related proteins, transporters and iron transport-related proteins were found to be highly co-up-regulated. In transcripts of arbuscular mycorrhizal fungi, expansion of cytochrome P450 was observed, which may contribute to various metabolic pathways required to accommodate roots and soil. The comprehensive gene expression data of both plants and arbuscular mycorrhizal fungi provide a powerful platform for investigating the functional and molecular mechanisms underlying arbuscular mycorrhizal symbiosis. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.
The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.
Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis
2017-07-15
The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
2015-01-01
Background Intensive research based on the inverse expression relationship has been undertaken to discover the miRNA-mRNA regulatory modules involved in the infection of Hepatitis C virus (HCV), the leading cause of chronic liver diseases. However, biological studies in other fields have found that inverse expression relationship is not the only regulatory relationship between miRNAs and their targets, and some miRNAs can positively regulate a mRNA by binding at the 5' UTR of the mRNA. Results This work focuses on the detection of both inverse and positive regulatory relationships from a paired miRNA and mRNA expression data set of HCV patients through a 'change-to-change' method which can derive connected discriminatory rules. Our study uncovered many novel miRNA-mRNA regulatory modules. In particular, it was revealed that GFRA2 is positively regulated by miR-557, miR-765 and miR-17-3p that probably bind at different locations of the 5' UTR of this mRNA. The expression relationship between GFRA2 and any of these three miRNAs has not been studied before, although separate research for this gene and these miRNAs have all drawn conclusions linked to hepatocellular carcinoma. This suggests that the binding of mRNA GFRA2 with miR-557, miR-765, or miR-17-3p, or their combinations, is worthy of further investigation by experimentation. We also report another mRNA QKI which has a strong inverse expression relationship with miR-129 and miR-493-3p which may bind at the 3' UTR of QKI with a perfect sequence match. Furthermore, the interaction between hsa-miR-129-5p (previous ID: hsa-miR-129) and QKI is supported with CLIP-Seq data from starBase. Our method can be easily extended for the expression data analysis of other diseases. Conclusion Our rule discovery method is useful for integrating binding information and expression profile for identifying HCV miRNA-mRNA regulatory modules and can be applied to the study of the expression profiles of other complex human diseases. PMID:25707620
Pérez, Matías Gastón; Macchiaroli, Natalia; Lichtenstein, Gabriel; Conti, Gabriela; Asurmendi, Sebastián; Milone, Diego Humberto; Stegmayer, Georgina; Kamenetzky, Laura; Cucher, Marcela; Rosenzvit, Mara Cecilia
2017-09-01
MicroRNAs (miRNAs) are small non-coding RNAs that have emerged as important regulators of gene expression and perform critical functions in development and disease. In spite of the increased interest in miRNAs from helminth parasites, no information is available on miRNAs from Taenia solium, the causative agent of cysticercosis, a neglected disease affecting millions of people worldwide. Here we performed a comprehensive analysis of miRNAs from Taenia crassiceps, a laboratory model for T. solium studies, and identified miRNAs in the T. solium genome. Moreover, we analysed the effect of praziquantel, one of the two main drugs used for cysticercosis treatment, on the miRNA expression profile of T. crassiceps cysticerci. Using small RNA-seq and two independent algorithms for miRNA prediction, as well as northern blot validation, we found transcriptional evidence of 39 miRNA loci in T. crassiceps. Since miRNAs were mapped to the T. solium genome, these miRNAs are considered common to both parasites. The miRNA expression profile of T. crassiceps was biased to the same set of highly expressed miRNAs reported in other cestodes. We found a significant altered expression of miR-7b under praziquantel treatment. In addition, we searched for miRNAs predicted to target genes related to drug response. We performed a detailed target prediction for miR-7b and found genes related to drug action. We report an initial approach to study the effect of sub-lethal drug treatment on miRNA expression in a cestode parasite, which provides a platform for further studies of miRNA involvement in drug effects. The results of our work could be applied to drug development and provide basic knowledge of cysticercosis and other neglected helminth infections. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.
Trypanosoma cruzi transcriptome during axenic epimastigote growth curve
dos Santos, Cyndia Mara Bezerra; Ludwig, Adriana; Kessler, Rafael Luis; Rampazzo, Rita de Cássia Pontello; Inoue, Alexandre Haruo; Krieger, Marco Aurélio; Pavoni, Daniela Parada; Probst, Christian Macagnan
2018-01-01
BACKGROUND Trypanosoma cruzi is an important protozoan parasite and the causative agent of Chagas disease. A critical step in understanding T. cruzi biology is the study of cellular and molecular features exhibited during its growth curve. OBJECTIVES We aimed to acquire a global view of the gene expression profile of T. cruzi during epimastigote growth. METHODS RNA-Seq analysis of total and polysomal/granular RNA fractions was performed along the 10 days T. cruzi epimastigote growth curve in vitro, in addition to cell viability and cell cycle analyses. We also analysed the polysome profile and investigated the presence of granular RNA by FISH and western blotting. FINDINGS We identified 1082 differentially expressed genes (DEGs), of which 220 were modulated in both fractions. According to the modulation pattern, DEGs were grouped into 12 clusters and showed enrichment of important gene ontology (GO) terms. Moreover, we showed that by the sixth day of the growth curve, polysomal content declined greatly and the RNA granules content appeared to increase, suggesting that a portion of mRNAs isolated from the sucrose gradient during late growth stages was associated with RNA granules and not only polyribosomes. Furthermore, we discuss several modulated genes possibly involved in T. cruzi growth, mainly during the stationary phase, such as genes related to cell cycle, pathogenesis, metabolic processes and RNA-binding proteins. PMID:29668769
Picardi, Ernesto; Gallo, Angela; Galeano, Federica; Tomaselli, Sara; Pesole, Graziano
2012-01-01
RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual. PMID:22957051
Echavarría-Consuegra, Liliana; Flipse, Jacky; Fernández, Geysson Javier; Kluiver, Joost; van den Berg, Anke; Urcuqui-Inchima, Silvio; Smit, Jolanda M.
2017-01-01
Background Due to the high burden of dengue disease worldwide, a better understanding of the interactions between dengue virus (DENV) and its human host cells is of the utmost importance. Although microRNAs modulate the outcome of several viral infections, their contribution to DENV replication is poorly understood. Methods and principal findings We investigated the microRNA expression profile of primary human macrophages challenged with DENV and deciphered the contribution of microRNAs to infection. To this end, human primary macrophages were challenged with GFP-expressing DENV and sorted to differentiate between truly infected cells (DENV-positive) and DENV-exposed but non-infected cells (DENV-negative cells). The miRNAome was determined by small RNA-Seq analysis and the effect of differentially expressed microRNAs on DENV yield was examined. Five microRNAs were differentially expressed in human macrophages challenged with DENV. Of these, miR-3614-5p was found upregulated in DENV-negative cells and its overexpression reduced DENV infectivity. The cellular targets of miR-3614-5p were identified by liquid chromatography/mass spectrometry and western blot. Adenosine deaminase acting on RNA 1 (ADAR1) was identified as one of the targets of miR-3614-5p and was shown to promote DENV infectivity at early time points post-infection. Conclusion/Significance Overall, miRNAs appear to play a limited role in DENV replication in primary human macrophages. The miRNAs that were found upregulated in DENV-infected cells did not control the production of infectious virus particles. On the other hand, miR-3614-5p, which was upregulated in DENV-negative macrophages, reduced DENV infectivity and regulated ADAR1 expression, a protein that facilitates viral replication. PMID:29045406
A novel plasma circular RNA circFARSA is a potential biomarker for non-small cell lung cancer.
Hang, Dong; Zhou, Jing; Qin, Na; Zhou, Wen; Ma, Hongxia; Jin, Guangfu; Hu, Zhibin; Dai, Juncheng; Shen, Hongbing
2018-06-01
Emerging evidence indicates that circular RNAs (circRNAs) are implicated in cancer development. This study aimed to evaluate whether circulating circRNAs may serve as novel biomarkers for non-small cell lung cancer (NSCLC). We used RNA sequencing (RNA-seq) and quantitative real-time PCR to explore cancer-related circRNAs. Bioinformatics and functional analyses were performed to reveal biological effects of circRNAs on lung cancer cells. A total of 5471 distinct circRNAs were identified by total RNA-seq, in which 185 were differentially expressed between cancerous and adjacent normal tissues. A circRNA derived from exon 5-7 of the FARSA gene, termed circFARSA, was observed to increase in cancerous tissues (P = 0.016), and was more abundant in patients' plasma than controls (P < 0.001). Overexpression of circFARSA in A549 cell line significantly promoted cell migration and invasion. In silico analysis suggested that circFARSA might sponge miR-330-5p and miR-326, thereby relieving their inhibitory effects on oncogene fatty acid synthase. Summarily, this study revealed circRNA profile of NSCLC for the first time and provided the evidence of plasma circFARSA as a potential noninvasive biomarker for this malignancy. © 2018 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Kim, Ji-Yeon; Lee, Eunjin; Park, Kyunghee; Park, Woong-Yang; Jung, Hae Hyun; Ahn, Jin Seok; Im, Young-Hyuck; Park, Yeon Hee
2017-04-25
Breast cancer (BC) has been genetically profiled through large-scale genome analyses. However, the role and clinical implications of genetic alterations in metastatic BC (MBC) have not been evaluated. Therefore, we conducted whole-exome sequencing (WES) and RNA-Seq of 37 MBC samples and targeted deep sequencing of another 29 MBCs. We evaluated somatic mutations from WES and targeted sequencing and assessed gene expression and performed pathway analysis from RNA-Seq. In this analysis, PIK3CA was the most commonly mutated gene in estrogen receptor (ER)-positive BC, while in ER-negative BC, TP53 was the most commonly mutated gene (p = 0.018 and p < 0.001, respectively). TP53 stopgain/loss and frameshift mutation was related to low expression of TP53 in contrast nonsynonymous mutation was related to high expression. The impact of TP53 mutation on clinical outcome varied with regard to ER status. In ER-positive BCs, wild type TP53 had a better prognosis than mutated TP53 (median overall survival (OS) (wild type vs. mutated): 88.5 ± 54.4 vs. 32.6 ± 10.7 (months), p = 0.002). In contrast, mutated TP53 had a protective effect in ER-negative BCs (median OS: 0.10 vs. 32.6 ± 8.2, p = 0.026). However, PIK3CA mutation did not affect patient survival. In gene expression analysis, CALM1, a potential regulator of AKT, was highly expressed in PIK3CA-mutated BCs. In conclusion, mutation of TP53 was associated with expression status and affect clinical outcome according to ER status in MBC. Although mutation of PIK3CA was not related to survival in this study, mutation of PIK3CA altered the expression of other genes and pathways including CALM1 and may be a potential predictive marker of PI3K inhibitor effectiveness.
Kandpal, Raj P; Rajasimha, Harsha K; Brooks, Matthew J; Nellissery, Jacob; Wan, Jun; Qian, Jiang; Kern, Timothy S; Swaroop, Anand
2012-01-01
To define gene expression changes associated with diabetic retinopathy in a mouse model using next generation sequencing, and to utilize transcriptome signatures to assess molecular pathways by which pharmacological agents inhibit diabetic retinopathy. We applied a high throughput RNA sequencing (RNA-seq) strategy using Illumina GAIIx to characterize the entire retinal transcriptome from nondiabetic and from streptozotocin-treated mice 32 weeks after induction of diabetes. Some of the diabetic mice were treated with inhibitors of receptor for advanced glycation endproducts (RAGE) and p38 mitogen activated protein (MAP) kinase, which have previously been shown to inhibit diabetic retinopathy in rodent models. The transcripts and alternatively spliced variants were determined in all experimental groups. Next generation sequencing-based RNA-seq profiles provided comprehensive signatures of transcripts that are altered in early stages of diabetic retinopathy. These transcripts encoded proteins involved in distinct yet physiologically relevant disease-associated pathways such as inflammation, microvasculature formation, apoptosis, glucose metabolism, Wnt signaling, xenobiotic metabolism, and photoreceptor biology. Significant upregulation of crystallin transcripts was observed in diabetic animals, and the diabetes-induced upregulation of these transcripts was inhibited in diabetic animals treated with inhibitors of either RAGE or p38 MAP kinase. These two therapies also showed dissimilar regulation of some subsets of transcripts that included alternatively spliced versions of arrestin, neutral sphingomyelinase activation associated factor (Nsmaf), SH3-domain GRB2-like interacting protein 1 (Sgip1), and axin. Diabetes alters many transcripts in the retina, and two therapies that inhibit the vascular pathology similarly inhibit a portion of these changes, pointing to possible molecular mechanisms for their beneficial effects. These therapies also changed the abundance of various alternatively spliced versions of signaling transcripts, suggesting a possible role of alternative splicing in disease etiology. Our studies clearly demonstrate RNA-seq as a comprehensive strategy for identifying disease-specific transcripts, and for determining comparative profiles of molecular changes mediated by candidate drugs.
Comprehensive gene expression analysis of canine invasive urothelial bladder carcinoma by RNA-Seq.
Maeda, Shingo; Tomiyasu, Hirotaka; Tsuboi, Masaya; Inoue, Akiko; Ishihara, Genki; Uchikai, Takao; Chambers, James K; Uchida, Kazuyuki; Yonezawa, Tomohiro; Matsuki, Naoaki
2018-04-27
Invasive urothelial carcinoma (iUC) is a major cause of death in humans, and approximately 165,000 individuals succumb to this cancer annually worldwide. Comparative oncology using relevant animal models is necessary to improve our understanding of progression, diagnosis, and treatment of iUC. Companion canines are a preferred animal model of iUC due to spontaneous tumor development and similarity to human disease in terms of histopathology, metastatic behavior, and treatment response. However, the comprehensive molecular characterization of canine iUC is not well documented. In this study, we performed transcriptome analysis of tissue samples from canine iUC and normal bladders using an RNA sequencing (RNA-Seq) approach to identify key molecular pathways in canine iUC. Total RNA was extracted from bladder tissues of 11 dogs with iUC and five healthy dogs, and RNA-Seq was conducted. Ingenuity Pathway Analysis (IPA) was used to assign differentially expressed genes to known upstream regulators and functional networks. Differential gene expression analysis of the RNA-Seq data revealed 2531 differentially expressed genes, comprising 1007 upregulated and 1524 downregulated genes, in canine iUC. IPA revealed that the most activated upstream regulator was PTGER2 (encoding the prostaglandin E 2 receptor EP2), which is consistent with the therapeutic efficiency of cyclooxygenase inhibitors in canine iUC. Similar to human iUC, canine iUC exhibited upregulated ERBB2 and downregulated TP53 pathways. Biological functions associated with cancer, cell proliferation, and leukocyte migration were predicted to be activated, while muscle functions were predicted to be inhibited, indicating muscle-invasive tumor property. Our data confirmed similarities in gene expression patterns between canine and human iUC and identified potential therapeutic targets (PTGER2, ERBB2, CCND1, Vegf, and EGFR), suggesting the value of naturally occurring canine iUC as a relevant animal model for human iUC.
Bayesian estimation of differential transcript usage from RNA-seq data.
Papastamoulis, Panagiotis; Rattray, Magnus
2017-11-27
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Savas, Peter; Virassamy, Balaji; Ye, Chengzhong; Salim, Agus; Mintoff, Christopher P; Caramia, Franco; Salgado, Roberto; Byrne, David J; Teo, Zhi L; Dushyanthen, Sathana; Byrne, Ann; Wein, Lironne; Luen, Stephen J; Poliness, Catherine; Nightingale, Sophie S; Skandarajah, Anita S; Gyorki, David E; Thornton, Chantel M; Beavis, Paul A; Fox, Stephen B; Darcy, Phillip K; Speed, Terence P; Mackay, Laura K; Neeson, Paul J; Loi, Sherene
2018-06-25
The quantity of tumor-infiltrating lymphocytes (TILs) in breast cancer (BC) is a robust prognostic factor for improved patient survival, particularly in triple-negative and HER2-overexpressing BC subtypes 1 . Although T cells are the predominant TIL population 2 , the relationship between quantitative and qualitative differences in T cell subpopulations and patient prognosis remains unknown. We performed single-cell RNA sequencing (scRNA-seq) of 6,311 T cells isolated from human BCs and show that significant heterogeneity exists in the infiltrating T cell population. We demonstrate that BCs with a high number of TILs contained CD8 + T cells with features of tissue-resident memory T (T RM ) cell differentiation and that these CD8 + T RM cells expressed high levels of immune checkpoint molecules and effector proteins. A CD8 + T RM gene signature developed from the scRNA-seq data was significantly associated with improved patient survival in early-stage triple-negative breast cancer (TNBC) and provided better prognostication than CD8 expression alone. Our data suggest that CD8 + T RM cells contribute to BC immunosurveillance and are the key targets of modulation by immune checkpoint inhibition. Further understanding of the development, maintenance and regulation of T RM cells will be crucial for successful immunotherapeutic development in BC.
Song, Wei; Jiang, Keji; Zhang, Fengying; Lin, Yu; Ma, Lingbo
2016-08-08
Acipenser baeri, one of the critically endangered animals on the verge of extinction, is a key species for evolutionary, developmental, physiology and conservation studies and a standout amongst the most important food products worldwide. Though the transcriptome of the early development of A. baeri has been published recently, the transcriptome changes occurring in the transition from embryonic to late stages are still unknown. The aim of this work was to analyze the transcriptomes of embryonic and post-embryonic stages of A. baeri and identify differentially expressed genes (DEGs) and their expression patterns using mRNA collected from specimens at big yolk plug, wide neural plate and 64 day old sturgeon developmental stages for RNA-Seq. The paired-end sequencing of the transcriptome of samples of A. baeri collected at two early (big yolk plug (T1, 32 h after fertilization) and wide neural plate formation (T2, 45 h after fertilization)) and one late (T22, 64 day old sturgeon) developmental stages using Illumina Hiseq2000 platform generated 64039846, 64635214 and 75293762 clean paired-end reads for T1, T2 and T22, respectively. After quality control, the sequencing reads were de novo assembled to generate a set of 149,265 unigenes with N50 value of 1277 bp. Functional annotation indicated that a substantial number of these unigenes had significant similarity with proteins in public databases. Differential expression profiling allowed the identification of 2789, 12,819 and 10,824 DEGs from the respective T1 vs. T2, T1 vs. T22 and T2 vs. T22 comparisons. High correlation of DEGs' features was recorded among early stages while significant divergences were observed when comparing the late stage with early stages. GO and KEGG enrichment analyses revealed the biological processes, cellular component, molecular functions and metabolic pathways associated with identified DEGs. The qRT-PCR performed for candidate genes in specimens confirmed the validity of the RNA-seq data. This study presents, for the first time, an extensive overview of RNA-Seq based characterization of the early and post-embryonic developmental transcriptomes of A. baeri and provided 149,265 gene sequences that will be potentially valuable for future molecular and genetic studies in A. baeri.
Drug Targeting and Biomarkers in Head and Neck Cancers: Insights from Systems Biology Analyses.
Islam, Tania; Rahman, Rezanur; Gov, Esra; Turanli, Beste; Gulfidan, Gizem; Haque, Anwarul; Arga, Kazım Yalçın; Haque Mollah, Nurul
2018-06-01
The head and neck squamous cell carcinoma (HNSCC) is one of the most common cancers in the world, but robust biomarkers and diagnostics are still not available. This study provides in-depth insights from systems biology analyses to identify molecular biomarker signatures to inform systematic drug targeting in HNSCC. Gene expression profiles from tumors and normal tissues of 22 patients with histological confirmation of nonmetastatic HNSCC were subjected to integrative analyses with genome-scale biomolecular networks (i.e., protein-protein interaction and transcriptional and post-transcriptional regulatory networks). We aimed to discover molecular signatures at RNA and protein levels, which could serve as potential drug targets for therapeutic innovation in the future. Eleven proteins, 5 transcription factors, and 20 microRNAs (miRNAs) came into prominence as potential drug targets. The differential expression profiles of these reporter biomolecules were cross-validated by independent RNA-Seq and miRNA-Seq datasets, and risk discrimination performance of the reporter biomolecules, BLNK, CCL2, E4F1, FOSL1, ISG15, MMP9, MYCN, MYH11, miR-1252, miR-29b, miR-29c, miR-3610, miR-431, and miR-523, was also evaluated. Using the transcriptome guided drug repositioning tool, geneXpharma, several candidate drugs were repurposed, including antineoplastic agents (e.g., gemcitabine and irinotecan), antidiabetics (e.g., rosiglitazone), dermatological agents (e.g., clocortolone and acitretin), and antipsychotics (e.g., risperidone), and binding affinities of the drugs to their potential targets were assessed using molecular docking analyses. The molecular signatures and repurposed drugs presented in this study warrant further attention for experimental studies since they offer significant potential as biomarkers and candidate therapeutics for precision medicine approaches to clinical management of HNSCC.
Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun
2014-06-01
Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.
Mykles, Donald L; Burnett, Karen G; Durica, David S; Joyce, Blake L; McCarthy, Fiona M; Schmidt, Carl J; Stillman, Jonathon H
2016-12-01
High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the "Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology" symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
Singh, Garima; Roy, Jyoti; Rout, Pratiti; Mallick, Bibekanand
2018-01-01
PIWI-interacting (piRNAs), ~23-36 nucleotide-long small non-coding RNAs (sncRNAs), earlier believed to be germline-specific, have now been identified in somatic cells, including cancer cells. These sncRNAs impact critical biological processes by fine-tuning gene expression at post-transcriptional and epigenetic levels. The expression of piRNAs in ovarian cancer, the most lethal gynecologic cancer is largely uncharted. In this study, we investigated the expression of PIWILs by qRT-PCR and western blotting and then identified piRNA transcriptomes in tissues of normal ovary and two most prevalent epithelial ovarian cancer subtypes, serous and endometrioid by small RNA sequencing. We detected 219, 256 and 234 piRNAs in normal ovary, endometrioid and serous ovarian cancer samples respectively. We observed piRNAs are encoded from various genomic regions, among which introns harbor the majority of them. Surprisingly, piRNAs originated from different genomic contexts showed the varied level of conservations across vertebrates. The functional analysis of predicted targets of differentially expressed piRNAs revealed these could modulate key processes and pathways involved in ovarian oncogenesis. Our study provides the first comprehensive piRNA landscape in these samples and a useful resource for further functional studies to decipher new mechanistic views of piRNA-mediated gene regulatory networks affecting ovarian oncogenesis. The RNA-seq data is submitted to GEO database (GSE83794).
DBATE: database of alternative transcripts expression.
Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2013-01-01
The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
Han, Kook; Tjaden, Brian; Lory, Stephen
2017-01-01
The first step in the post-transcriptional regulatory function of most bacterial small non-coding RNAs (sRNAs) is base-pairing with partially complementary sequences of targeted transcripts. We present a simple method for identifying sRNA targets in vivo and defining processing sites of the regulated transcripts. The technique (referred to as GRIL-Seq) is based on preferential ligation of sRNAs to ends of base-paired targets in bacteria co-expressing T4 RNA ligase, followed by sequencing to identify the chimeras. In addition to the RNA chaperone Hfq, the GRIL-Seq method depends on the activity of the pyrophosphorylase RppH. Using PrrF1, an iron-regulated sRNA in Pseudomonas aeruginosa, we demonstrate that direct regulatory targets of this sRNA can be readily identified. Therefore, GRIL-Seq represents a powerful tool not only for identifying direct targets of sRNAs in a variety of environments, but can also result in uncovering novel roles for sRNAs and their targets in complex regulatory networks. PMID:28005055
Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq.
Jaitin, Diego Adhemar; Weiner, Assaf; Yofe, Ido; Lara-Astiaso, David; Keren-Shaul, Hadas; David, Eyal; Salame, Tomer Meir; Tanay, Amos; van Oudenaarden, Alexander; Amit, Ido
2016-12-15
In multicellular organisms, dedicated regulatory circuits control cell type diversity and responses. The crosstalk and redundancies within these circuits and substantial cellular heterogeneity pose a major research challenge. Here, we present CRISP-seq, an integrated method for massively parallel single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-pooled screens. We show that profiling the genomic perturbation and transcriptome in the same cell enables us to simultaneously elucidate the function of multiple factors and their interactions. We applied CRISP-seq to probe regulatory circuits of innate immunity. By sampling tens of thousands of perturbed cells in vitro and in mice, we identified interactions and redundancies between developmental and signaling-dependent factors. These include opposing effects of Cebpb and Irf8 in regulating the monocyte/macrophage versus dendritic cell lineages and differential functions for Rela and Stat1/2 in monocyte versus dendritic cell responses to pathogens. This study establishes CRISP-seq as a broadly applicable, comprehensive, and unbiased approach for elucidating mammalian regulatory circuits. Copyright © 2016 Elsevier Inc. All rights reserved.
Xu, Maoqi; Chen, Liang
2018-01-01
The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Venkata Narayanan, Ishwarya; Paulsen, Michelle T.; Bedi, Karan; Berg, Nathan; Ljungman, Emily A.; Francia, Sofia; Veloso, Artur; Magnuson, Brian; di Fagagna, Fabrizio d’Adda; Wilson, Thomas E.; Ljungman, Mats
2017-01-01
In response to ionizing radiation (IR), cells activate a DNA damage response (DDR) pathway to re-program gene expression. Previous studies using total cellular RNA analyses have shown that the stress kinase ATM and the transcription factor p53 are integral components required for induction of IR-induced gene expression. These studies did not distinguish between changes in RNA synthesis and RNA turnover and did not address the role of enhancer elements in DDR-mediated transcriptional regulation. To determine the contribution of synthesis and degradation of RNA and monitor the activity of enhancer elements following exposure to IR, we used the recently developed Bru-seq, BruChase-seq and BruUV-seq techniques. Our results show that ATM and p53 regulate both RNA synthesis and stability as well as enhancer element activity following exposure to IR. Importantly, many genes in the p53-signaling pathway were coordinately up-regulated by both increased synthesis and RNA stability while down-regulated genes were suppressed either by reduced synthesis or stability. Our study is the first of its kind that independently assessed the effects of ionizing radiation on transcription and post-transcriptional regulation in normal human cells. PMID:28256581
Ayars, Michael; O’Sullivan, Eileen; Macgregor-Das, Anne; Shindo, Koji; Kim, Haeryoung; Borges, Michael; Yu, Jun; Hruban, Ralph H.; Goggins, Michael
2017-01-01
Pancreatic ductal adenocarcinoma evolves from precursor lesions, the most common of which is pancreatic intraepithelial neoplasia (PanIN). We performed RNA-sequencing analysis of laser capture microdissected PanINs and normal pancreatic duct cells to identify differentially expressed genes between PanINs and normal pancreatic duct, and between low-grade and high-grade PanINs. One of the most highly overexpressed transcripts identified in PanIN is interleukin-2 receptor subunit gamma (IL2RG) encoding the common gamma chain, IL2Rγ. CRISPR-mediated knockout of IL2RG in orthotopically implanted pancreatic cancer cells resulted in attenuated tumor growth in mice and reduced JAK3 expression in orthotopic tumors. These results indicate that IL2Rγ/JAK3 signaling contributes to pancreatic cancer cell growth in vivo. PMID:29137350
2011-01-01
Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock. PMID:21504589
Characterization of Human Salivary Extracellular RNA by Next-generation Sequencing.
Li, Feng; Kaczor-Urbanowicz, Karolina Elżbieta; Sun, Jie; Majem, Blanca; Lo, Hsien-Chun; Kim, Yong; Koyano, Kikuye; Liu Rao, Shannon; Young Kang, So; Mi Kim, Su; Kim, Kyoung-Mee; Kim, Sung; Chia, David; Elashoff, David; Grogan, Tristan R; Xiao, Xinshu; Wong, David T W
2018-04-23
It was recently discovered that abundant and stable extracellular RNA (exRNA) species exist in bodily fluids. Saliva is an emerging biofluid for biomarker development for noninvasive detection and screening of local and systemic diseases. Use of RNA-Sequencing (RNA-Seq) to profile exRNA is rapidly growing; however, no single preparation and analysis protocol can be used for all biofluids. Specifically, RNA-Seq of saliva is particularly challenging owing to high abundance of bacterial contents and low abundance of salivary exRNA. Given the laborious procedures needed for RNA-Seq library construction, sequencing, data storage, and data analysis, saliva-specific and optimized protocols are essential. We compared different RNA isolation methods and library construction kits for long and small RNA sequencing. The role of ribosomal RNA (rRNA) depletion also was evaluated. The miRNeasy Micro Kit (Qiagen) showed the highest total RNA yield (70.8 ng/mL cell-free saliva) and best small RNA recovery, and the NEBNext library preparation kits resulted in the highest number of detected human genes [5649-6813 at 1 reads per kilobase RNA per million mapped (RPKM)] and small RNAs [482-696 microRNAs (miRNAs) and 190-214 other small RNAs]. The proportion of human RNA-Seq reads was much higher in rRNA-depleted saliva samples (41%) than in samples without rRNA depletion (14%). In addition, the transfer RNA (tRNA)-derived RNA fragments (tRFs), a novel class of small RNAs, were highly abundant in human saliva, specifically tRF-4 (4%) and tRF-5 (15.25%). Our results may help in selection of the best adapted methods of RNA isolation and small and long RNA library constructions for salivary exRNA studies. © 2018 American Association for Clinical Chemistry.
A long and abundant non-coding RNA in Lactobacillus salivarius.
Cousin, Fabien J; Lynch, Denise B; Chuat, Victoria; Bourin, Maxence J B; Casey, Pat G; Dalmasso, Marion; Harris, Hugh M B; McCann, Angela; O'Toole, Paul W
2017-09-01
Lactobacillus salivarius , found in the intestinal microbiota of humans and animals, is studied as an example of the sub-dominant intestinal commensals that may impart benefits upon their host. Strains typically harbour at least one megaplasmid that encodes functions contributing to contingency metabolism and environmental adaptation. RNA sequencing (RNA-seq)transcriptomic analysis of L. salivarius strain UCC118 identified the presence of a novel unusually abundant long non-coding RNA (lncRNA) encoded by the megaplasmid, and which represented more than 75 % of the total RNA-seq reads after depletion of rRNA species. The expression level of this 520 nt lncRNA in L. salivarius UCC118 exceeded that of the 16S rRNA, it accumulated during growth, was very stable over time and was also expressed during intestinal transit in a mouse. This lncRNA sequence is specific to the L. salivarius species; however, among 45 L . salivarius genomes analysed, not all (only 34) harboured the sequence for the lncRNA. This lncRNA was produced in 27 tested L. salivarius strains, but at strain-specific expression levels. High-level lncRNA expression correlated with high megaplasmid copy number. Transcriptome analysis of a deletion mutant lacking this lncRNA identified altered expression levels of genes in a number of pathways, but a definitive function of this new lncRNA was not identified. This lncRNA presents distinctive and unique properties, and suggests potential basic and applied scientific developments of this phenomenon.
McNeil, Meredith D; Bhuiyan, Shamsul A; Berkman, Paul J; Croft, Barry J; Aitken, Karen S
2018-01-01
Smut caused by biotrophic fungus Sporisorium scitamineum is a major disease of cultivated sugarcane that can cause considerable yield losses. It has been suggested in literature that there are at least two types of resistance mechanisms in sugarcane plants: an external resistance, due to chemical or physical barriers in the sugarcane bud, and an internal resistance governed by the interaction of plant and fungus within the plant tissue. Detailed molecular studies interrogating these two different resistance mechanisms in sugarcane are scarce. Here, we use light microscopy and global expression profiling with RNA-seq to investigate these mechanisms in sugarcane cultivar CP74-2005, a cultivar that possibly possesses both internal and external defence mechanisms. A total of 861 differentially expressed genes (DEGs) were identified in a comparison between infected and non-infected buds at 48 hours post-inoculation (hpi), with 457 (53%) genes successfully annotated using BLAST2GO software. This includes genes involved in the phenylpropanoid pathway, cell wall biosynthesis, plant hormone signal transduction and disease resistance genes. Finally, the expression of 13 DEGs with putative roles in S. scitamineum resistance were confirmed by quantitative real-time reverse transcription PCR (qRT-PCR) analysis, and the results were consistent with the RNA-seq data. These results highlight that the early sugarcane response to S. scitamineum infection is complex and many of the disease response genes are attenuated in sugarcane cultivar CP74-2005, while others, like genes involved in the phenylpropanoid pathway, are induced. This may point to the role of the different disease resistance mechanisms that operate in cultivars such as CP74-2005, whereby the early response is dominated by external mechanisms and then as the infection progresses, the internal mechanisms are switched on. Identification of genes underlying resistance in sugarcane will increase our knowledge of the sugarcane-S. scitamineum interaction and facilitate the introgression of new resistance genes into commercial sugarcane cultivars.
Crowder, Camerron M; Meyer, Eli; Fan, Tung-Yung; Weis, Virginia M
2017-08-01
Reproductive timing in brooding corals has been correlated to temperature and lunar irradiance, but the mechanisms by which corals transduce these environmental variables into molecular signals are unknown. To gain insight into these processes, global gene expression profiles in the coral Pocillopora damicornis were examined (via RNA-Seq) across lunar phases and between temperature treatments, during a monthly planulation cycle. The interaction of temperature and lunar day together had the largest influence on gene expression. Mean timing of planulation, which occurred at lunar days 7.4 and 12.5 for 28- and 23°C-treated corals, respectively, was associated with an upregulation of transcripts in individual temperature treatments. Expression profiles of planulation-associated genes were compared between temperature treatments, revealing that elevated temperatures disrupted expression profiles associated with planulation. Gene functions inferred from homologous matches to online databases suggest complex neuropeptide signalling, with calcium as a central mediator, acting through tyrosine kinase and G protein-coupled receptor pathways. This work contributes to our understanding of coral reproductive physiology and the impacts of environmental variables on coral reproductive pathways. © 2017 John Wiley & Sons Ltd.
Gong, Ting; Szustakowski, Joseph D
2013-04-15
For heterogeneous tissues, measurements of gene expression through mRNA-Seq data are confounded by relative proportions of cell types involved. In this note, we introduce an efficient pipeline: DeconRNASeq, an R package for deconvolution of heterogeneous tissues based on mRNA-Seq data. It adopts a globally optimized non-negative decomposition algorithm through quadratic programming for estimating the mixing proportions of distinctive tissue types in next-generation sequencing data. We demonstrated the feasibility and validity of DeconRNASeq across a range of mixing levels and sources using mRNA-Seq data mixed in silico at known concentrations. We validated our computational approach for various benchmark data, with high correlation between our predicted cell proportions and the real fractions of tissues. Our study provides a rigorous, quantitative and high-resolution tool as a prerequisite to use mRNA-Seq data. The modularity of package design allows an easy deployment of custom analytical pipelines for data from other high-throughput platforms. DeconRNASeq is written in R, and is freely available at http://bioconductor.org/packages. Supplementary data are available at Bioinformatics online.
Wallace, Andrew D.; Hodgson, Ernest; Roe, R. Michael
2017-01-01
While the synthesis and use of new chemical compounds is at an all-time high, the study of their potential impact on human health is quickly falling behind, and new methods are needed to assess their impact. We chose to examine the effects of two common environmental chemicals, the insect repellent N,N-diethyl-m-toluamide (DEET) and the insecticide fluocyanobenpyrazole (fipronil), on transcript levels of long non-protein coding RNAs (lncRNAs) in primary human hepatocytes using a global RNA-Seq approach. While lncRNAs are believed to play a critical role in numerous important biological processes, many still remain uncharacterized, and their functions and modes of action remain largely unclear, especially in relation to environmental chemicals. RNA-Seq showed that 100 µM DEET significantly increased transcript levels for 2 lncRNAs and lowered transcript levels for 18 lncRNAs, while fipronil at 10 µM increased transcript levels for 76 lncRNAs and decreased levels for 193 lncRNAs. A mixture of 100 µM DEET and 10 µM fipronil increased transcript levels for 75 lncRNAs and lowered transcript levels for 258 lncRNAs. This indicates a more-than-additive effect on lncRNA transcript expression when the two chemicals were presented in combination versus each chemical alone. Differentially expressed lncRNA genes were mapped to chromosomes, analyzed by proximity to neighboring protein-coding genes, and functionally characterized via gene ontology and molecular mapping algorithms. While further testing is required to assess the organismal impact of changes in transcript levels, this initial analysis links several of the dysregulated lncRNAs to processes and pathways critical to proper cellular function, such as the innate and adaptive immune response and the p53 signaling pathway. PMID:28991164
RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes
Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.
2013-01-01
Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052