Sample records for expression quantification 3seq

  1. Quantifying circular RNA expression from RNA-seq data using model-based framework.

    PubMed

    Li, Musheng; Xie, Xueying; Zhou, Jing; Sheng, Mengying; Yin, Xiaofeng; Ko, Eun-A; Zhou, Tong; Gu, Wanjun

    2017-07-15

    Circular RNAs (circRNAs) are a class of non-coding RNAs that are widely expressed in various cell lines and tissues of many organisms. Although the exact function of many circRNAs is largely unknown, the cell type-and tissue-specific circRNA expression has implicated their crucial functions in many biological processes. Hence, the quantification of circRNA expression from high-throughput RNA-seq data is becoming important to ascertain. Although many model-based methods have been developed to quantify linear RNA expression from RNA-seq data, these methods are not applicable to circRNA quantification. Here, we proposed a novel strategy that transforms circular transcripts to pseudo-linear transcripts and estimates the expression values of both circular and linear transcripts using an existing model-based algorithm, Sailfish. The new strategy can accurately estimate transcript expression of both linear and circular transcripts from RNA-seq data. Several factors, such as gene length, amount of expression and the ratio of circular to linear transcripts, had impacts on quantification performance of circular transcripts. In comparison to count-based tools, the new computational framework had superior performance in estimating the amount of circRNA expression from both simulated and real ribosomal RNA-depleted (rRNA-depleted) RNA-seq datasets. On the other hand, the consideration of circular transcripts in expression quantification from rRNA-depleted RNA-seq data showed substantial increased accuracy of linear transcript expression. Our proposed strategy was implemented in a program named Sailfish-cir. Sailfish-cir is freely available at https://github.com/zerodel/Sailfish-cir . tongz@medicine.nevada.edu or wanjun.gu@gmail.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  2. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis.

    PubMed

    Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich

    2015-12-16

    Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.

  3. Quantification of differential gene expression by multiplexed targeted resequencing of cDNA

    PubMed Central

    Arts, Peer; van der Raadt, Jori; van Gestel, Sebastianus H.C.; Steehouwer, Marloes; Shendure, Jay; Hoischen, Alexander; Albers, Cornelis A.

    2017-01-01

    Whole-transcriptome or RNA sequencing (RNA-Seq) is a powerful and versatile tool for functional analysis of different types of RNA molecules, but sample reagent and sequencing cost can be prohibitive for hypothesis-driven studies where the aim is to quantify differential expression of a limited number of genes. Here we present an approach for quantification of differential mRNA expression by targeted resequencing of complementary DNA using single-molecule molecular inversion probes (cDNA-smMIPs) that enable highly multiplexed resequencing of cDNA target regions of ∼100 nucleotides and counting of individual molecules. We show that accurate estimates of differential expression can be obtained from molecule counts for hundreds of smMIPs per reaction and that smMIPs are also suitable for quantification of relative gene expression and allele-specific expression. Compared with low-coverage RNA-Seq and a hybridization-based targeted RNA-Seq method, cDNA-smMIPs are a cost-effective high-throughput tool for hypothesis-driven expression analysis in large numbers of genes (10 to 500) and samples (hundreds to thousands). PMID:28474677

  4. Dynamic expression of 3′ UTRs revealed by Poisson hidden Markov modeling of RNA-Seq: Implications in gene expression profiling

    PubMed Central

    Lu, Jun; Bushel, Pierre R.

    2013-01-01

    RNA sequencing (RNA-Seq) allows for the identification of novel exon-exon junctions and quantification of gene expression levels. We show that from RNA-Seq data one may also detect utilization of alternative polyadenylation (APA) in 3′ untranslated regions (3′ UTRs) known to play a critical role in the regulation of mRNA stability, cellular localization and translation efficiency. Given the dynamic nature of APA, it is desirable to examine the APA on a sample by sample basis. We used a Poisson hidden Markov model (PHMM) of RNA-Seq data to identify potential APA in human liver and brain cortex tissues leading to shortened 3′ UTRs. Over three hundred transcripts with shortened 3′ UTRs were detected with sensitivity >75% and specificity >60%. tissue-specific 3′ UTR shortening was observed for 32 genes with a q-value ≤ 0.1. When compared to alternative isoforms detected by Cufflinks or MISO, our PHMM method agreed on over 100 transcripts with shortened 3′ UTRs. Given the increasing usage of RNA-Seq for gene expression profiling, using PHMM to investigate sample-specific 3′ UTR shortening could be an added benefit from this emerging technology. PMID:23845781

  5. RNA-Skim: a rapid method for RNA-Seq quantification at transcript level

    PubMed Central

    Zhang, Zhaojun; Wang, Wei

    2014-01-01

    Motivation: RNA-Seq technique has been demonstrated as a revolutionary means for exploring transcriptome because it provides deep coverage and base pair-level resolution. RNA-Seq quantification is proven to be an efficient alternative to Microarray technique in gene expression study, and it is a critical component in RNA-Seq differential expression analysis. Most existing RNA-Seq quantification tools require the alignments of fragments to either a genome or a transcriptome, entailing a time-consuming and intricate alignment step. To improve the performance of RNA-Seq quantification, an alignment-free method, Sailfish, has been recently proposed to quantify transcript abundances using all k-mers in the transcriptome, demonstrating the feasibility of designing an efficient alignment-free method for transcriptome quantification. Even though Sailfish is substantially faster than alternative alignment-dependent methods such as Cufflinks, using all k-mers in the transcriptome quantification impedes the scalability of the method. Results: We propose a novel RNA-Seq quantification method, RNA-Skim, which partitions the transcriptome into disjoint transcript clusters based on sequence similarity, and introduces the notion of sig-mers, which are a special type of k-mers uniquely associated with each cluster. We demonstrate that the sig-mer counts within a cluster are sufficient for estimating transcript abundances with accuracy comparable with any state-of-the-art method. This enables RNA-Skim to perform transcript quantification on each cluster independently, reducing a complex optimization problem into smaller optimization tasks that can be run in parallel. As a result, RNA-Skim uses <4% of the k-mers and <10% of the CPU time required by Sailfish. It is able to finish transcriptome quantification in <10 min per sample by using just a single thread on a commodity computer, which represents >100 speedup over the state-of-the-art alignment-based methods, while delivering comparable or higher accuracy. Availability and implementation: The software is available at http://www.csbio.unc.edu/rs. Contact: weiwang@cs.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931995

  6. Statistical modeling of isoform splicing dynamics from RNA-seq time series data.

    PubMed

    Huang, Yuanhua; Sanguinetti, Guido

    2016-10-01

    Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet there are no methods that explicitly leverage the experimental design to improve isoform quantification. Here, we present DICEseq, the first isoform quantification method tailored to correlated RNA-seq experiments. DICEseq explicitly models the correlations between different RNA-seq experiments to aid the quantification of isoforms across experiments. Numerical experiments on simulated datasets show that DICEseq yields more accurate results than state-of-the-art methods, an advantage that can become considerable at low coverage levels. On real datasets, our results show that DICEseq provides substantially more reproducible and robust quantifications, increasing the correlation of estimates from replicate datasets by up to 10% on genes with low or moderate expression levels (bottom third of all genes). Furthermore, DICEseq permits to quantify the trade-off between temporal sampling of RNA and depth of sequencing, frequently an important choice when planning experiments. Our results have strong implications for the design of RNA-seq experiments, and offer a novel tool for improved analysis of such datasets. Python code is freely available at http://diceseq.sf.net G.Sanguinetti@ed.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies

    PubMed Central

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets. PMID:25937948

  8. Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

    PubMed

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance

    2013-01-01

    RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.

  9. Comparison of alternative approaches for analysing multi-level RNA-seq data

    PubMed Central

    Mohorianu, Irina; Bretman, Amanda; Smith, Damian T.; Fowler, Emily K.; Dalmay, Tamas

    2017-01-01

    RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments. PMID:28792517

  10. From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline.

    PubMed

    Chen, Yunshun; Lun, Aaron T L; Smyth, Gordon K

    2016-01-01

    In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.

  11. Streaming fragment assignment for real-time analysis of sequencing experiments

    PubMed Central

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280

  12. Time Series Expression Analyses Using RNA-seq: A Statistical Approach

    PubMed Central

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021

  13. Time series expression analyses using RNA-seq: a statistical approach.

    PubMed

    Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

    2013-01-01

    RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.

  14. RNA-Seq for Bacterial Gene Expression.

    PubMed

    Poulsen, Line Dahl; Vinther, Jeppe

    2018-06-01

    RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.

  15. miR-MaGiC improves quantification accuracy for small RNA-seq.

    PubMed

    Russell, Pamela H; Vestal, Brian; Shi, Wen; Rudra, Pratyaydipta D; Dowell, Robin; Radcliffe, Richard; Saba, Laura; Kechris, Katerina

    2018-05-15

    Many tools have been developed to profile microRNA (miRNA) expression from small RNA-seq data. These tools must contend with several issues: the small size of miRNAs, the small number of unique miRNAs, the fact that similar miRNAs can be transcribed from multiple loci, and the presence of miRNA isoforms known as isomiRs. Methods failing to address these issues can return misleading information. We propose a novel quantification method designed to address these concerns. We present miR-MaGiC, a novel miRNA quantification method, implemented as a cross-platform tool in Java. miR-MaGiC performs stringent mapping to a core region of each miRNA and defines a meaningful set of target miRNA sequences by collapsing the miRNA space to "functional groups". We hypothesize that these two features, mapping stringency and collapsing, provide more optimal quantification to a more meaningful unit (i.e., miRNA family). We test miR-MaGiC and several published methods on 210 small RNA-seq libraries, evaluating each method's ability to accurately reflect global miRNA expression profiles. We define accuracy as total counts close to the total number of input reads originating from miRNAs. We find that miR-MaGiC, which incorporates both stringency and collapsing, provides the most accurate counts.

  16. Gene expression profiling of human breast tissue samples using SAGE-Seq.

    PubMed

    Wu, Zhenhua Jeremy; Meyer, Clifford A; Choudhury, Sibgat; Shipitsin, Michail; Maruyama, Reo; Bessarabova, Marina; Nikolskaya, Tatiana; Sukumar, Saraswati; Schwartzman, Armin; Liu, Jun S; Polyak, Kornelia; Liu, X Shirley

    2010-12-01

    We present a powerful application of ultra high-throughput sequencing, SAGE-Seq, for the accurate quantification of normal and neoplastic mammary epithelial cell transcriptomes. We develop data analysis pipelines that allow the mapping of sense and antisense strands of mitochondrial and RefSeq genes, the normalization between libraries, and the identification of differentially expressed genes. We find that the diversity of cancer transcriptomes is significantly higher than that of normal cells. Our analysis indicates that transcript discovery plateaus at 10 million reads/sample, and suggests a minimum desired sequencing depth around five million reads. Comparison of SAGE-Seq and traditional SAGE on normal and cancerous breast tissues reveals higher sensitivity of SAGE-Seq to detect less-abundant genes, including those encoding for known breast cancer-related transcription factors and G protein-coupled receptors (GPCRs). SAGE-Seq is able to identify genes and pathways abnormally activated in breast cancer that traditional SAGE failed to call. SAGE-Seq is a powerful method for the identification of biomarkers and therapeutic targets in human disease.

  17. Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq.

    PubMed

    Liu, Ruolin; Dickerson, Julie

    2017-11-01

    We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.

  18. IAOseq: inferring abundance of overlapping genes using RNA-seq data.

    PubMed

    Sun, Hong; Yang, Shuang; Tun, Liangliang; Li, Yixue

    2015-01-01

    Overlapping transcription constitutes a common mechanism for regulating gene expression. A major limitation of the overlapping transcription assays is the lack of high throughput expression data. We developed a new tool (IAOseq) that is based on reads distributions along the transcribed regions to identify the expression levels of overlapping genes from standard RNA-seq data. Compared with five commonly used quantification methods, IAOseq showed better performance in the estimation accuracy of overlapping transcription levels. For the same strand overlapping transcription, currently existing high-throughput methods are rarely available to distinguish which strand was present in the original mRNA template. The IAOseq results showed that the commonly used methods gave an average of 1.6 fold overestimation of the expression levels of same strand overlapping genes. This work provides a useful tool for mining overlapping transcription levels from standard RNA-seq libraries. IAOseq could be used to help us understand the complex regulatory mechanism mediated by overlapping transcripts. IAOseq is freely available at http://lifecenter.sgst.cn/main/en/IAO_seq.jsp.

  19. Analytical workflow profiling gene expression in murine macrophages

    PubMed Central

    Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.

    2015-01-01

    Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305

  20. Unifying cancer and normal RNA sequencing data from different sources

    PubMed Central

    Wang, Qingguo; Armenia, Joshua; Zhang, Chao; Penson, Alexander V.; Reznik, Ed; Zhang, Liguo; Minet, Thais; Ochoa, Angelica; Gross, Benjamin E.; Iacobuzio-Donahue, Christine A.; Betel, Doron; Taylor, Barry S.; Gao, Jianjiong; Schultz, Nikolaus

    2018-01-01

    Driven by the recent advances of next generation sequencing (NGS) technologies and an urgent need to decode complex human diseases, a multitude of large-scale studies were conducted recently that have resulted in an unprecedented volume of whole transcriptome sequencing (RNA-seq) data, such as the Genotype Tissue Expression project (GTEx) and The Cancer Genome Atlas (TCGA). While these data offer new opportunities to identify the mechanisms underlying disease, the comparison of data from different sources remains challenging, due to differences in sample and data processing. Here, we developed a pipeline that processes and unifies RNA-seq data from different studies, which includes uniform realignment, gene expression quantification, and batch effect removal. We find that uniform alignment and quantification is not sufficient when combining RNA-seq data from different sources and that the removal of other batch effects is essential to facilitate data comparison. We have processed data from GTEx and TCGA and successfully corrected for study-specific biases, enabling comparative analysis between TCGA and GTEx. The normalized datasets are available for download on figshare. PMID:29664468

  1. Mixture models reveal multiple positional bias types in RNA-Seq data and lead to accurate transcript concentration estimates.

    PubMed

    Tuerk, Andreas; Wiktorin, Gregor; Güler, Serhat

    2017-05-01

    Accuracy of transcript quantification with RNA-Seq is negatively affected by positional fragment bias. This article introduces Mix2 (rd. "mixquare"), a transcript quantification method which uses a mixture of probability distributions to model and thereby neutralize the effects of positional fragment bias. The parameters of Mix2 are trained by Expectation Maximization resulting in simultaneous transcript abundance and bias estimates. We compare Mix2 to Cufflinks, RSEM, eXpress and PennSeq; state-of-the-art quantification methods implementing some form of bias correction. On four synthetic biases we show that the accuracy of Mix2 overall exceeds the accuracy of the other methods and that its bias estimates converge to the correct solution. We further evaluate Mix2 on real RNA-Seq data from the Microarray and Sequencing Quality Control (MAQC, SEQC) Consortia. On MAQC data, Mix2 achieves improved correlation to qPCR measurements with a relative increase in R2 between 4% and 50%. Mix2 also yields repeatable concentration estimates across technical replicates with a relative increase in R2 between 8% and 47% and reduced standard deviation across the full concentration range. We further observe more accurate detection of differential expression with a relative increase in true positives between 74% and 378% for 5% false positives. In addition, Mix2 reveals 5 dominant biases in MAQC data deviating from the common assumption of a uniform fragment distribution. On SEQC data, Mix2 yields higher consistency between measured and predicted concentration ratios. A relative error of 20% or less is obtained for 51% of transcripts by Mix2, 40% of transcripts by Cufflinks and RSEM and 30% by eXpress. Titration order consistency is correct for 47% of transcripts for Mix2, 41% for Cufflinks and RSEM and 34% for eXpress. We, further, observe improved repeatability across laboratory sites with a relative increase in R2 between 8% and 44% and reduced standard deviation.

  2. RNA-Seq-Based Transcript Structure Analysis with TrBorderExt.

    PubMed

    Wang, Yejun; Sun, Ming-An; White, Aaron P

    2018-01-01

    RNA-Seq has become a routine strategy for genome-wide gene expression comparisons in bacteria. Despite lower resolution in transcript border parsing compared with dRNA-Seq, TSS-EMOTE, Cappable-seq, Term-seq, and others, directional RNA-Seq still illustrates its advantages: low cost, quantification and transcript border analysis with a medium resolution (±10-20 nt). To facilitate mining of directional RNA-Seq datasets especially with respect to transcript structure analysis, we developed a tool, TrBorderExt, which can parse transcript start sites and termination sites accurately in bacteria. A detailed protocol is described in this chapter for how to use the software package step by step to identify bacterial transcript borders from raw RNA-Seq data. The package was developed with Perl and R programming languages, and is accessible freely through the website: http://www.szu-bioinf.org/TrBorderExt .

  3. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  4. The RNASeq-er API-a gateway to systematically updated analysis of public RNA-seq data.

    PubMed

    Petryszak, Robert; Fonseca, Nuno A; Füllgrabe, Anja; Huerta, Laura; Keays, Maria; Tang, Y Amy; Brazma, Alvis

    2017-07-15

    The exponential growth of publicly available RNA-sequencing (RNA-Seq) data poses an increasing challenge to researchers wishing to discover, analyse and store such data, particularly those based in institutions with limited computational resources. EMBL-EBI is in an ideal position to address these challenges and to allow the scientific community easy access to not just raw, but also processed RNA-Seq data. We present a Web service to access the results of a systematically and continually updated standardized alignment as well as gene and exon expression quantification of all public bulk (and in the near future also single-cell) RNA-Seq runs in 264 species in European Nucleotide Archive, using Representational State Transfer. The RNASeq-er API (Application Programming Interface) enables ontology-powered search for and retrieval of CRAM, bigwig and bedGraph files, gene and exon expression quantification matrices (Fragments Per Kilobase Of Exon Per Million Fragments Mapped, Transcripts Per Million, raw counts) as well as sample attributes annotated with ontology terms. To date over 270 00 RNA-Seq runs in nearly 10 000 studies (1PB of raw FASTQ data) in 264 species in ENA have been processed and made available via the API. The RNASeq-er API can be accessed at http://www.ebi.ac.uk/fg/rnaseq/api . The commands used to analyse the data are available in supplementary materials and at https://github.com/nunofonseca/irap/wiki/iRAP-single-library . rnaseq@ebi.ac.uk ; rpetry@ebi.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  5. Gene expression distribution deconvolution in single-cell RNA sequencing.

    PubMed

    Wang, Jingshu; Huang, Mo; Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Murray, John; Raj, Arjun; Li, Mingyao; Zhang, Nancy R

    2018-06-26

    Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene's expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND's noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers. Copyright © 2018 the Author(s). Published by PNAS.

  6. Single-nucleus RNA-seq of differentiating human myoblasts reveals the extent of fate heterogeneity

    PubMed Central

    Zeng, Weihua; Jiang, Shan; Kong, Xiangduo; El-Ali, Nicole; Ball, Alexander R.; Ma, Christopher I-Hsing; Hashimoto, Naohiro; Yokomori, Kyoko; Mortazavi, Ali

    2016-01-01

    Myoblasts are precursor skeletal muscle cells that differentiate into fused, multinucleated myotubes. Current single-cell microfluidic methods are not optimized for capturing very large, multinucleated cells such as myotubes. To circumvent the problem, we performed single-nucleus transcriptome analysis. Using immortalized human myoblasts, we performed RNA-seq analysis of single cells (scRNA-seq) and single nuclei (snRNA-seq) and found them comparable, with a distinct enrichment for long non-coding RNAs (lncRNAs) in snRNA-seq. We then compared snRNA-seq of myoblasts before and after differentiation. We observed the presence of mononucleated cells (MNCs) that remained unfused and analyzed separately from multi-nucleated myotubes. We found that while the transcriptome profiles of myoblast and myotube nuclei are relatively homogeneous, MNC nuclei exhibited significant heterogeneity, with the majority of them adopting a distinct mesenchymal state. Primary transcripts for microRNAs (miRNAs) that participate in skeletal muscle differentiation were among the most differentially expressed lncRNAs, which we validated using NanoString. Our study demonstrates that snRNA-seq provides reliable transcriptome quantification for cells that are otherwise not amenable to current single-cell platforms. Our results further indicate that snRNA-seq has unique advantage in capturing nucleus-enriched lncRNAs and miRNA precursors that are useful in mapping and monitoring differential miRNA expression during cellular differentiation. PMID:27566152

  7. Evaluation of microRNA alignment techniques

    PubMed Central

    Kaspi, Antony; El-Osta, Assam

    2016-01-01

    Genomic alignment of small RNA (smRNA) sequences such as microRNAs poses considerable challenges due to their short length (∼21 nucleotides [nt]) as well as the large size and complexity of plant and animal genomes. While several tools have been developed for high-throughput mapping of longer mRNA-seq reads (>30 nt), there are few that are specifically designed for mapping of smRNA reads including microRNAs. The accuracy of these mappers has not been systematically determined in the case of smRNA-seq. In addition, it is unknown whether these aligners accurately map smRNA reads containing sequence errors and polymorphisms. By using simulated read sets, we determine the alignment sensitivity and accuracy of 16 short-read mappers and quantify their robustness to mismatches, indels, and nontemplated nucleotide additions. These were explored in the context of a plant genome (Oryza sativa, ∼500 Mbp) and a mammalian genome (Homo sapiens, ∼3.1 Gbp). Analysis of simulated and real smRNA-seq data demonstrates that mapper selection impacts differential expression results and interpretation. These results will inform on best practice for smRNA mapping and enable more accurate smRNA detection and quantification of expression and RNA editing. PMID:27284164

  8. Mapping RNA-seq Reads with STAR

    PubMed Central

    Dobin, Alexander; Gingeras, Thomas R.

    2015-01-01

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920

  9. Mapping RNA-seq Reads with STAR.

    PubMed

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  10. APAtrap: identification and quantification of alternative polyadenylation sites from RNA-seq data.

    PubMed

    Ye, Congting; Long, Yuqi; Ji, Guoli; Li, Qingshun Quinn; Wu, Xiaohui

    2018-06-01

    Alternative polyadenylation (APA) has been increasingly recognized as a crucial mechanism that contributes to transcriptome diversity and gene expression regulation. As RNA-seq has become a routine protocol for transcriptome analysis, it is of great interest to leverage such unprecedented collection of RNA-seq data by new computational methods to extract and quantify APA dynamics in these transcriptomes. However, research progress in this area has been relatively limited. Conventional methods rely on either transcript assembly to determine transcript 3' ends or annotated poly(A) sites. Moreover, they can neither identify more than two poly(A) sites in a gene nor detect dynamic APA site usage considering more than two poly(A) sites. We developed an approach called APAtrap based on the mean squared error model to identify and quantify APA sites from RNA-seq data. APAtrap is capable of identifying novel 3' UTRs and 3' UTR extensions, which contributes to locating potential poly(A) sites in previously overlooked regions and improving genome annotations. APAtrap also aims to tally all potential poly(A) sites and detect genes with differential APA site usages between conditions. Extensive comparisons of APAtrap with two other latest methods, ChangePoint and DaPars, using various RNA-seq datasets from simulation studies, human and Arabidopsis demonstrate the efficacy and flexibility of APAtrap for any organisms with an annotated genome. Freely available for download at https://apatrap.sourceforge.io. liqq@xmu.edu.cn or xhuister@xmu.edu.cn. Supplementary data are available at Bioinformatics online.

  11. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing.

    PubMed

    Marinov, Georgi K; Williams, Brian A; McCue, Ken; Schroth, Gary P; Gertz, Jason; Myers, Richard M; Wold, Barbara J

    2014-03-01

    Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.

  12. TEcandidates: Prediction of genomic origin of expressed Transposable Elements using RNA-seq data.

    PubMed

    Valdebenito-Maturana, Braulio; Riadi, Gonzalo

    2018-06-01

    In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multimapping reads coming from their repetitive sequences. Current approaches that address multimapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell. We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file, and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed, and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis. The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates. griadi@utalca.cl. Supplementary data are available at Bioinformatics online.

  13. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples.

    PubMed

    Behr, Jonas; Kahles, André; Zhong, Yi; Sreedharan, Vipin T; Drewe, Philipp; Rätsch, Gunnar

    2013-10-15

    High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.

  14. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application.

    PubMed

    D'Antonio, Mattia; D'Onorio De Meo, Paolo; Pallocca, Matteo; Picardi, Ernesto; D'Erchia, Anna Maria; Calogero, Raffaele A; Castrignanò, Tiziana; Pesole, Graziano

    2015-01-01

    The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs.

  15. Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq

    PubMed Central

    Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H.; Keleş, Sündüz; Dewey, Colin N.

    2016-01-01

    RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level. PMID:27405803

  16. Integrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.

    PubMed

    Liu, Peng; Sanalkumar, Rajendran; Bresnick, Emery H; Keleş, Sündüz; Dewey, Colin N

    2016-08-01

    RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of unique sequence. To improve quantification accuracy in such difficult cases, we developed a novel computational method, prior-enhanced RSEM (pRSEM), which uses a complementary data type in addition to RNA-seq data. We found that ChIP-seq data of RNA polymerase II and histone modifications were particularly informative in this approach. In qRT-PCR validations, pRSEM was shown to be superior than competing methods in estimating relative isoform abundances within or across conditions. Data-driven simulations suggested that pRSEM has a greatly decreased false-positive rate at the expense of a small increase in false-negative rate. In aggregate, our study demonstrates that pRSEM transforms existing capacity to precisely estimate transcript abundances, especially at the isoform level. © 2016 Liu et al.; Published by Cold Spring Harbor Laboratory Press.

  17. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.

  18. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    PubMed

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  19. RAP: RNA-Seq Analysis Pipeline, a new cloud-based NGS web application

    PubMed Central

    2015-01-01

    Background The study of RNA has been dramatically improved by the introduction of Next Generation Sequencing platforms allowing massive and cheap sequencing of selected RNA fractions, also providing information on strand orientation (RNA-Seq). The complexity of transcriptomes and of their regulative pathways make RNA-Seq one of most complex field of NGS applications, addressing several aspects of the expression process (e.g. identification and quantification of expressed genes and transcripts, alternative splicing and polyadenylation, fusion genes and trans-splicing, post-transcriptional events, etc.). Moreover, the huge volume of data generated by NGS platforms introduces unprecedented computational and technological challenges to efficiently analyze and store sequence data and results. Methods In order to provide researchers with an effective and friendly resource for analyzing RNA-Seq data, we present here RAP (RNA-Seq Analysis Pipeline), a cloud computing web application implementing a complete but modular analysis workflow. This pipeline integrates both state-of-the-art bioinformatics tools for RNA-Seq analysis and in-house developed scripts to offer to the user a comprehensive strategy for data analysis. RAP is able to perform quality checks (adopting FastQC and NGS QC Toolkit), identify and quantify expressed genes and transcripts (with Tophat, Cufflinks and HTSeq), detect alternative splicing events (using SpliceTrap) and chimeric transcripts (with ChimeraScan). This pipeline is also able to identify splicing junctions and constitutive or alternative polyadenylation sites (implementing custom analysis modules) and call for statistically significant differences in genes and transcripts expression, splicing pattern and polyadenylation site usage (using Cuffdiff2 and DESeq). Results Through a user friendly web interface, the RAP workflow can be suitably customized by the user and it is automatically executed on our cloud computing environment. This strategy allows to access to bioinformatics tools and computational resources without specific bioinformatics and IT skills. RAP provides a set of tabular and graphical results that can be helpful to browse, filter and export analyzed data, according to the user needs. PMID:26046471

  20. Identification of Human HK Genes and Gene Expression Regulation Study in Cancer from Transcriptomics Data Analysis

    PubMed Central

    Zhang, Zhang; Liu, Jingxing; Wu, Jiayan; Yu, Jun

    2013-01-01

    The regulation of gene expression is essential for eukaryotes, as it drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms. RNA-Sequencing (RNA-Seq) provides researchers with a powerful toolbox for characterization and quantification of transcriptome. Many different human tissue/cell transcriptome datasets coming from RNA-Seq technology are available on public data resource. The fundamental issue here is how to develop an effective analysis method to estimate expression pattern similarities between different tumor tissues and their corresponding normal tissues. We define the gene expression pattern from three directions: 1) expression breadth, which reflects gene expression on/off status, and mainly concerns ubiquitously expressed genes; 2) low/high or constant/variable expression genes, based on gene expression level and variation; and 3) the regulation of gene expression at the gene structure level. The cluster analysis indicates that gene expression pattern is higher related to physiological condition rather than tissue spatial distance. Two sets of human housekeeping (HK) genes are defined according to cell/tissue types, respectively. To characterize the gene expression pattern in gene expression level and variation, we firstly apply improved K-means algorithm and a gene expression variance model. We find that cancer-associated HK genes (a HK gene is specific in cancer group, while not in normal group) are expressed higher and more variable in cancer condition than in normal condition. Cancer-associated HK genes prefer to AT-rich genes, and they are enriched in cell cycle regulation related functions and constitute some cancer signatures. The expression of large genes is also avoided in cancer group. These studies will help us understand which cell type-specific patterns of gene expression differ among different cell types, and particularly for cancer. PMID:23382867

  1. Whole-transcriptome, high-throughput RNA sequence analysis of the bovine macrophage response to Mycobacterium bovis infection in vitro.

    PubMed

    Nalpas, Nicolas C; Park, Stephen D E; Magee, David A; Taraktsoglou, Maria; Browne, John A; Conlon, Kevin M; Rue-Albrecht, Kévin; Killick, Kate E; Hokamp, Karsten; Lohan, Amanda J; Loftus, Brendan J; Gormley, Eamonn; Gordon, Stephen V; MacHugh, David E

    2013-04-08

    Mycobacterium bovis, the causative agent of bovine tuberculosis, is an intracellular pathogen that can persist inside host macrophages during infection via a diverse range of mechanisms that subvert the host immune response. In the current study, we have analysed and compared the transcriptomes of M. bovis-infected monocyte-derived macrophages (MDM) purified from six Holstein-Friesian females with the transcriptomes of non-infected control MDM from the same animals over a 24 h period using strand-specific RNA sequencing (RNA-seq). In addition, we compare gene expression profiles generated using RNA-seq with those previously generated by us using the high-density Affymetrix® GeneChip® Bovine Genome Array platform from the same MDM-extracted RNA. A mean of 7.2 million reads from each MDM sample mapped uniquely and unambiguously to single Bos taurus reference genome locations. Analysis of these mapped reads showed 2,584 genes (1,392 upregulated; 1,192 downregulated) and 757 putative natural antisense transcripts (558 upregulated; 119 downregulated) that were differentially expressed based on sense and antisense strand data, respectively (adjusted P-value ≤ 0.05). Of the differentially expressed genes, 694 were common to both the sense and antisense data sets, with the direction of expression (i.e. up- or downregulation) positively correlated for 693 genes and negatively correlated for the remaining gene. Gene ontology analysis of the differentially expressed genes revealed an enrichment of immune, apoptotic and cell signalling genes. Notably, the number of differentially expressed genes identified from RNA-seq sense strand analysis was greater than the number of differentially expressed genes detected from microarray analysis (2,584 genes versus 2,015 genes). Furthermore, our data reveal a greater dynamic range in the detection and quantification of gene transcripts for RNA-seq compared to microarray technology. This study highlights the value of RNA-seq in identifying novel immunomodulatory mechanisms that underlie host-mycobacterial pathogen interactions during infection, including possible complex post-transcriptional regulation of host gene expression involving antisense RNA.

  2. From root to fruit: RNA-Seq analysis shows that arbuscular mycorrhizal symbiosis may affect tomato fruit metabolism.

    PubMed

    Zouari, Inès; Salvioli, Alessandra; Chialva, Matteo; Novero, Mara; Miozzi, Laura; Tenore, Gian Carlo; Bagnaresi, Paolo; Bonfante, Paola

    2014-03-21

    Tomato (Solanum lycopersicum) establishes a beneficial symbiosis with arbuscular mycorrhizal (AM) fungi. The formation of the mycorrhizal association in the roots leads to plant-wide modulation of gene expression. To understand the systemic effect of the fungal symbiosis on the tomato fruit, we used RNA-Seq to perform global transcriptome profiling on Moneymaker tomato fruits at the turning ripening stage. Fruits were collected at 55 days after flowering, from plants colonized with Funneliformis mosseae and from control plants, which were fertilized to avoid responses related to nutrient deficiency. Transcriptome analysis identified 712 genes that are differentially expressed in fruits from mycorrhizal and control plants. Gene Ontology (GO) enrichment analysis of these genes showed 81 overrepresented functional GO classes. Up-regulated GO classes include photosynthesis, stress response, transport, amino acid synthesis and carbohydrate metabolism functions, suggesting a general impact of fungal symbiosis on primary metabolisms and, particularly, on mineral nutrition. Down-regulated GO classes include cell wall, metabolism and ethylene response pathways. Quantitative RT-PCR validated the RNA-Seq results for 12 genes out of 14 when tested at three fruit ripening stages, mature green, breaker and turning. Quantification of fruit nutraceutical and mineral contents produced values consistent with the expression changes observed by RNA-Seq analysis. This RNA-Seq profiling produced a novel data set that explores the intersection of mycorrhization and fruit development. We found that the fruits of mycorrhizal plants show two transcriptomic "signatures": genes characteristic of a climacteric fleshy fruit, and genes characteristic of mycorrhizal status, like phosphate and sulphate transporters. Moreover, mycorrhizal plants under low nutrient conditions produce fruits with a nutrient content similar to those from non-mycorrhizal plants under high nutrient conditions, indicating that AM fungi can help replace exogenous fertilizer for fruit crops.

  3. Visual Display of 5p-arm and 3p-arm miRNA Expression with a Mobile Application.

    PubMed

    Pan, Chao-Yu; Kuo, Wei-Ting; Chiu, Chien-Yuan; Lin, Wen-Chang

    2017-01-01

    MicroRNAs (miRNAs) play important roles in human cancers. In previous studies, we have demonstrated that both 5p-arm and 3p-arm of mature miRNAs could be expressed from the same precursor and we further interrogated the 5p-arm and 3p-arm miRNA expression with a comprehensive arm feature annotation list. To assist biologists to visualize the differential 5p-arm and 3p-arm miRNA expression patterns, we utilized a user-friendly mobile App to display. The Cancer Genome Atlas (TCGA) miRNA-Seq expression information. We have collected over 4,500 miRNA-Seq datasets from 15 TCGA cancer types and further processed them with the 5p-arm and 3p-arm annotation analysis pipeline. In order to be displayed with the RNA-Seq Viewer App, annotated 5p-arm and 3p-arm miRNA expression information and miRNA gene loci information were converted into SQLite tables. In this distinct application, for any given miRNA gene, 5p-arm miRNA is illustrated on the top of chromosome ideogram and 3p-arm miRNA is illustrated on the bottom of chromosome ideogram. Users can then easily interrogate the differentially 5p-arm/3p-arm expressed miRNAs with their mobile devices. This study demonstrates the feasibility and utility of RNA-Seq Viewer App in addition to mRNA-Seq data visualization.

  4. The Role of CYP3A4 mRNA Transcript with Shortened 3′-Untranslated Region in Hepatocyte Differentiation, Liver Development, and Response to Drug InductionS⃞

    PubMed Central

    Li, Dan; Gaedigk, Roger; Hart, Steven N.; Leeder, J. Steven

    2012-01-01

    Cytochrome P450 3A4 (CYP3A4) metabolizes more than 50% of prescribed drugs. The expression of CYP3A4 changes during liver development and may be affected by the administration of some drugs. Alternative mRNA transcripts occur in more than 90% of human genes and are frequently observed in cells responding to developmental and environmental signals. Different mRNA transcripts may encode functionally distinct proteins or contribute to variability of mRNA stability or protein translation efficiency. The purpose of this study was to examine expression of alternative CYP3A4 mRNA transcripts in hepatocytes in response to developmental signals and drugs. cDNA cloning and RNA sequencing (RNA-Seq) were used to identify CYP3A4 mRNA transcripts. Three transcripts were found in HepaRG cells and liver tissues: one represented a canonical mRNA with full-length 3′-untranslated region (UTR), one had a shorter 3′-UTR, and one contained partial intron-6 retention. The alternative mRNA transcripts were validated by either rapid amplification of cDNA 3′-end or endpoint polymerase chain reaction (PCR). Quantification of the transcripts by RNA-Seq and real time quantitative PCR revealed that the CYP3A4 transcript with shorter 3′-UTR was preferentially expressed in developed livers, differentiated hepatocytes, and in rifampicin- and phenobarbital-induced hepatocytes. The CYP3A4 transcript with shorter 3′-UTR was more stable and produced more protein compared with the CYP3A4 transcript with canonical 3′-UTR. We conclude that the 3′-end processing of CYP3A4 contributes to the quantitative regulation of CYP3A4 gene expression through alternative polyadenylation, which may serve as a regulatory mechanism explaining changes of CYP3A4 expression and activity during hepatocyte differentiation and liver development and in response to drug induction. PMID:21998292

  5. RNA-sequence data normalization through in silico prediction of reference genes: the bacterial response to DNA damage as case study.

    PubMed

    Berghoff, Bork A; Karlsson, Torgny; Källman, Thomas; Wagner, E Gerhart H; Grabherr, Manfred G

    2017-01-01

    Measuring how gene expression changes in the course of an experiment assesses how an organism responds on a molecular level. Sequencing of RNA molecules, and their subsequent quantification, aims to assess global gene expression changes on the RNA level (transcriptome). While advances in high-throughput RNA-sequencing (RNA-seq) technologies allow for inexpensive data generation, accurate post-processing and normalization across samples is required to eliminate any systematic noise introduced by the biochemical and/or technical processes. Existing methods thus either normalize on selected known reference genes that are invariant in expression across the experiment, assume that the majority of genes are invariant, or that the effects of up- and down-regulated genes cancel each other out during the normalization. Here, we present a novel method, moose 2 , which predicts invariant genes in silico through a dynamic programming (DP) scheme and applies a quadratic normalization based on this subset. The method allows for specifying a set of known or experimentally validated invariant genes, which guides the DP. We experimentally verified the predictions of this method in the bacterium Escherichia coli , and show how moose 2 is able to (i) estimate the expression value distances between RNA-seq samples, (ii) reduce the variation of expression values across all samples, and (iii) to subsequently reveal new functional groups of genes during the late stages of DNA damage. We further applied the method to three eukaryotic data sets, on which its performance compares favourably to other methods. The software is implemented in C++ and is publicly available from http://grabherr.github.io/moose2/. The proposed RNA-seq normalization method, moose 2 , is a valuable alternative to existing methods, with two major advantages: (i) in silico prediction of invariant genes provides a list of potential reference genes for downstream analyses, and (ii) non-linear artefacts in RNA-seq data are handled adequately to minimize variations between replicates.

  6. The Transcriptome of the Reference Potato Genome Solanum tuberosum Group Phureja Clone DM1-3 516R44

    PubMed Central

    Massa, Alicia N.; Childs, Kevin L.; Lin, Haining; Bryan, Glenn J.; Giuliano, Giovanni; Buell, C. Robin

    2011-01-01

    Advances in molecular breeding in potato have been limited by its complex biological system, which includes vegetative propagation, autotetraploidy, and extreme heterozygosity. The availability of the potato genome and accompanying gene complement with corresponding gene structure, location, and functional annotation are powerful resources for understanding this complex plant and advancing molecular breeding efforts. Here, we report a reference for the potato transcriptome using 32 tissues and growth conditions from the doubled monoploid Solanum tuberosum Group Phureja clone DM1-3 516R44 for which a genome sequence is available. Analysis of greater than 550 million RNA-Seq reads permitted the detection and quantification of expression levels of over 22,000 genes. Hierarchical clustering and principal component analyses captured the biological variability that accounts for gene expression differences among tissues suggesting tissue-specific gene expression, and genes with tissue or condition restricted expression. Using gene co-expression network analysis, we identified 18 gene modules that represent tissue-specific transcriptional networks of major potato organs and developmental stages. This information provides a powerful resource for potato research as well as studies on other members of the Solanaceae family. PMID:22046362

  7. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

    PubMed Central

    2011-01-01

    Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981

  9. Comprehensive analysis of transcriptome variation uncovers known and novel driver events in T-cell acute lymphoblastic leukemia.

    PubMed

    Atak, Zeynep Kalender; Gianfelici, Valentina; Hulselmans, Gert; De Keersmaecker, Kim; Devasia, Arun George; Geerdens, Ellen; Mentens, Nicole; Chiaretti, Sabina; Durinck, Kaat; Uyttebroeck, Anne; Vandenberghe, Peter; Wlodarska, Iwona; Cloos, Jacqueline; Foà, Robin; Speleman, Frank; Cools, Jan; Aerts, Stein

    2013-01-01

    RNA-seq is a promising technology to re-sequence protein coding genes for the identification of single nucleotide variants (SNV), while simultaneously obtaining information on structural variations and gene expression perturbations. We asked whether RNA-seq is suitable for the detection of driver mutations in T-cell acute lymphoblastic leukemia (T-ALL). These leukemias are caused by a combination of gene fusions, over-expression of transcription factors and cooperative point mutations in oncogenes and tumor suppressor genes. We analyzed 31 T-ALL patient samples and 18 T-ALL cell lines by high-coverage paired-end RNA-seq. First, we optimized the detection of SNVs in RNA-seq data by comparing the results with exome re-sequencing data. We identified known driver genes with recurrent protein altering variations, as well as several new candidates including H3F3A, PTK2B, and STAT5B. Next, we determined accurate gene expression levels from the RNA-seq data through normalizations and batch effect removal, and used these to classify patients into T-ALL subtypes. Finally, we detected gene fusions, of which several can explain the over-expression of key driver genes such as TLX1, PLAG1, LMO1, or NKX2-1; and others result in novel fusion transcripts encoding activated kinases (SSBP2-FER and TPM3-JAK2) or involving MLLT10. In conclusion, we present novel analysis pipelines for variant calling, variant filtering, and expression normalization on RNA-seq data, and successfully applied these for the detection of translocations, point mutations, INDELs, exon-skipping events, and expression perturbations in T-ALL.

  10. The host-pathogen interaction between wheat and yellow rust induces temporally coordinated waves of gene expression.

    PubMed

    Dobon, Albor; Bunting, Daniel C E; Cabrera-Quio, Luis Enrique; Uauy, Cristobal; Saunders, Diane G O

    2016-05-20

    Understanding how plants and pathogens modulate gene expression during the host-pathogen interaction is key to uncovering the molecular mechanisms that regulate disease progression. Recent advances in sequencing technologies have provided new opportunities to decode the complexity of such interactions. In this study, we used an RNA-based sequencing approach (RNA-seq) to assess the global expression profiles of the wheat yellow rust pathogen Puccinia striiformis f. sp. tritici (PST) and its host during infection. We performed a detailed RNA-seq time-course for a susceptible and a resistant wheat host infected with PST. This study (i) defined the global gene expression profiles for PST and its wheat host, (ii) substantially improved the gene models for PST, (iii) evaluated the utility of several programmes for quantification of global gene expression for PST and wheat, and (iv) identified clusters of differentially expressed genes in the host and pathogen. By focusing on components of the defence response in susceptible and resistant hosts, we were able to visualise the effect of PST infection on the expression of various defence components and host immune receptors. Our data showed sequential, temporally coordinated activation and suppression of expression of a suite of immune-response regulators that varied between compatible and incompatible interactions. These findings provide the framework for a better understanding of how PST causes disease and support the idea that PST can suppress the expression of defence components in wheat to successfully colonize a susceptible host.

  11. iSeq: Web-Based RNA-seq Data Analysis and Visualization.

    PubMed

    Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng

    2018-01-01

    Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .

  12. From the viral perspective: infectious salmon anemia virus (ISAV) transcriptome during the infective process in Atlantic salmon (Salmo salar).

    PubMed

    Valenzuela-Miranda, Diego; Cabrejos, María Eugenia; Yañez, José Manuel; Gallardo-Escárate, Cristian

    2015-04-01

    The infectious salmon anemia virus (ISAV) is a severe disease that mainly affects the Atlantic salmon (Salmo salar) aquaculture industry. Although several transcriptional studies have aimed to understand Salmon-ISAV interaction through the evaluation of host-gene transcription, none of them has focused their attention upon the viral transcriptional dynamics. For this purpose, RNA-Seq and RT-qPCR analyses were conducted in gills, liver and head-kidney of S. salar challenged by cohabitation with ISAV. Results evidence the time and tissue transcript patterns involved in the viral expression and how the transcription levels of ISAV segments are directly linked with the protein abundance found in other virus of the Orthomyxoviridae family. In addition, RT-qPCR result evidenced that quantification of ISAV through amplification of segment 3 would result in a more sensitive approach for detection and quantification of ISAV. This study offers a more comprehensive approach regarding the ISAV infective process and gives novel knowledge for its molecular detection. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions.

    PubMed

    Gatto, Alberto; Torroja-Fungairiño, Carlos; Mazzarotto, Francesco; Cook, Stuart A; Barton, Paul J R; Sánchez-Cabo, Fátima; Lara-Pezzi, Enrique

    2014-04-01

    Alternative splicing is the main mechanism governing protein diversity. The recent developments in RNA-Seq technology have enabled the study of the global impact and regulation of this biological process. However, the lack of standardized protocols constitutes a major bottleneck in the analysis of alternative splicing. This is particularly important for the identification of exon-exon junctions, which is a critical step in any analysis workflow. Here we performed a systematic benchmarking of alignment tools to dissect the impact of design and method on the mapping, detection and quantification of splice junctions from multi-exon reads. Accordingly, we devised a novel pipeline based on TopHat2 combined with a splice junction detection algorithm, which we have named FineSplice. FineSplice allows effective elimination of spurious junction hits arising from artefactual alignments, achieving up to 99% precision in both real and simulated data sets and yielding superior F1 scores under most tested conditions. The proposed strategy conjugates an efficient mapping solution with a semi-supervised anomaly detection scheme to filter out false positives and allows reliable estimation of expressed junctions from the alignment output. Ultimately this provides more accurate information to identify meaningful splicing patterns. FineSplice is freely available at https://sourceforge.net/p/finesplice/.

  14. Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data

    PubMed Central

    Wood, David L. A.; Nones, Katia; Steptoe, Anita; Christ, Angelika; Harliwong, Ivon; Newell, Felicity; Bruxner, Timothy J. C.; Miller, David; Cloonan, Nicole; Grimmond, Sean M.

    2015-01-01

    Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci. PMID:25965996

  15. Information transduction capacity reduces the uncertainties in annotation-free isoform discovery and quantification

    PubMed Central

    Deng, Yue; Bao, Feng; Yang, Yang; Ji, Xiangyang; Du, Mulong; Zhang, Zhengdong

    2017-01-01

    Abstract The automated transcript discovery and quantification of high-throughput RNA sequencing (RNA-seq) data are important tasks of next-generation sequencing (NGS) research. However, these tasks are challenging due to the uncertainties that arise in the inference of complete splicing isoform variants from partially observed short reads. Here, we address this problem by explicitly reducing the inherent uncertainties in a biological system caused by missing information. In our approach, the RNA-seq procedure for transforming transcripts into short reads is considered an information transmission process. Consequently, the data uncertainties are substantially reduced by exploiting the information transduction capacity of information theory. The experimental results obtained from the analyses of simulated datasets and RNA-seq datasets from cell lines and tissues demonstrate the advantages of our method over state-of-the-art competitors. Our algorithm is an open-source implementation of MaxInfo. PMID:28911101

  16. Identification of Prostate Cancer-Specific microDNAs

    DTIC Science & Technology

    2014-12-01

    displacement amplification (MDA). 2 adopted multiple displacement amplification (MDA) with random primers for enriched circular DNA by rolling circle ... amplification (RCA) (Fig. 1) and then amplified DNA fragments were subject to deep sequencing. Sequence NO of Reads seq 1 184 seq 2 133 seq 3 2407 seq...prostate cancer cells through multiple displacement amplification .  Clone #7 is the top candidate which has been cloned in an expression vector and it

  17. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

    PubMed

    Łabaj, Paweł P; Leparc, Germán G; Linggi, Bryan E; Markillie, Lye Meng; Wiley, H Steven; Kreil, David P

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

  18. Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction

    PubMed Central

    Eicher, John D.; Wakabayashi, Yoshiyuki; Vitseva, Olga; Esa, Nada; Yang, Yanqin; Zhu, Jun; Freedman, Jane E.; McManus, David D.; Johnson, Andrew D.

    2016-01-01

    Transcripts in platelets are largely produced in precursor megakaryocytes but remain physiologically-active as platelets translate RNAs and regulate protein/RNA levels. Recent studies using transcriptome sequencing (RNA-seq) characterized the platelet transcriptome in limited numbers of non-diseased individuals. Here, we expand upon these RNA-seq studies by completing RNA-seq in platelets from 32 patients with acute myocardial infarction (MI). Our goals were to characterize the platelet transcriptome using a population of patients with acute MI and relate gene expression to platelet aggregation measures and ST-segment elevation MI (STEMI) (n=16) versus non-STEMI (NSTEMI) (n=16) subtypes. Similar to other studies, we detected 9,565 expressed transcripts, including several known platelet-enriched markers (e.g., PPBP, OST4). Our RNA-seq data strongly correlated with independently ascertained platelet expression data and showed enrichment for platelet-related pathways (e.g., wound response, hemostasis, and platelet activation), as well as actin-related and post-transcriptional processes. Several transcripts displayed suggestively higher (FBXL4, ECHDC3, KCNE1, TAOK2, AURKB, ERG, and FKBP5) and lower (MIAT, PVRL3and PZP) expression in STEMI platelets compared to NSTEMI. We also identified transcripts correlated with platelet aggregation to TRAP (ATP6V1G2, SLC2A3), collagen (CEACAM1, ITGA2), and ADP (PDGFB, PDGFC, ST3GAL6). Our study adds to current platelet gene expression resources by providing transcriptome-wide analyses in platelets isolated from patients with acute MI. In concert with prior studies, we identify various genes for further study in regards to platelet function and acute MI. Future platelet RNA-seq studies examining more diverse sets of healthy and diseased samples will add to our understanding of platelet thrombotic and non-thrombotic functions. PMID:26367242

  19. YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research

    PubMed Central

    Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei

    2015-01-01

    We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function ‘Meta-analysis’ is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications. PMID:25398902

  20. Grape RNA-Seq analysis pipeline environment

    PubMed Central

    Knowles, David G.; Röder, Maik; Merkel, Angelika; Guigó, Roderic

    2013-01-01

    Motivation: The avalanche of data arriving since the development of NGS technologies have prompted the need for developing fast, accurate and easily automated bioinformatic tools capable of dealing with massive datasets. Among the most productive applications of NGS technologies is the sequencing of cellular RNA, known as RNA-Seq. Although RNA-Seq provides similar or superior dynamic range than microarrays at similar or lower cost, the lack of standard and user-friendly pipelines is a bottleneck preventing RNA-Seq from becoming the standard for transcriptome analysis. Results: In this work we present a pipeline for processing and analyzing RNA-Seq data, that we have named Grape (Grape RNA-Seq Analysis Pipeline Environment). Grape supports raw sequencing reads produced by a variety of technologies, either in FASTA or FASTQ format, or as prealigned reads in SAM/BAM format. A minimal Grape configuration consists of the file location of the raw sequencing reads, the genome of the species and the corresponding gene and transcript annotation. Grape first runs a set of quality control steps, and then aligns the reads to the genome, a step that is omitted for prealigned read formats. Grape next estimates gene and transcript expression levels, calculates exon inclusion levels and identifies novel transcripts. Grape can be run on a single computer or in parallel on a computer cluster. It is distributed with specific mapping and quantification tools, but given its modular design, any tool supporting popular data interchange formats can be integrated. Availability: Grape can be obtained from the Bioinformatics and Genomics website at: http://big.crg.cat/services/grape. Contact: david.gonzalez@crg.eu or roderic.guigo@crg.eu PMID:23329413

  1. Cell culture compositions

    DOEpatents

    Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yiao, Jian

    2014-03-18

    The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6 (SEQ ID NO:1 encodes the full length endoglucanase; SEQ ID NO:4 encodes the mature form), and the corresponding endoglucanase VI amino acid sequence ("EGVI"; SEQ ID NO:3 is the signal sequence; SEQ ID NO:2 is the mature sequence). The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.

  2. GobyWeb: Simplified Management and Analysis of Gene Expression and DNA Methylation Sequencing Data

    PubMed Central

    Dorff, Kevin C.; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien

    2013-01-01

    We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins. PMID:23936070

  3. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    PubMed Central

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  4. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    PubMed

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  5. The fractured landscape of RNA-seq alignment: the default in our STARs.

    PubMed

    Ballouz, Sara; Dobin, Alexander; Gingeras, Thomas R; Gillis, Jesse

    2018-06-01

    Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but are focused on either model data or fail to explain variation in performance. This leaves us to ask, what is the most meaningful way to assess different alignment choices? And importantly, where is there room for progress? In this work, we explore the answers to these two questions by performing an exhaustive assessment of the STAR aligner. We assess STAR's performance across a range of alignment parameters using common metrics, and then on biologically focused tasks. We find technical metrics such as fraction mapping or expression profile correlation to be uninformative, capturing properties unlikely to have any role in biological discovery. Surprisingly, we find that changes in alignment parameters within a wide range have little impact on both technical and biological performance. Yet, when performance finally does break, it happens in difficult regions, such as X-Y paralogs and MHC genes. We believe improved reporting by developers will help establish where results are likely to be robust or fragile, providing a better baseline to establish where methodological progress can still occur.

  6. Simultaneous Breast Expression in Breastfeeding Women Is More Efficacious Than Sequential Breast Expression

    PubMed Central

    Garbin, Catherine P.; Hartmann, Peter E.; Kent, Jacqueline C.

    2012-01-01

    Abstract Introduction Simultaneous (SIM) breast expression saves mothers time compared with sequential (SEQ) expression, but it remains unclear whether the two methods differ in milk output efficiency and efficacy. Subjects and Methods The Showmilk device (Medela AG, Baar, Switzerland) was used to measure milk output and milk ejection during breast expression (electric pump) in 31 Australian breastfeeding mothers of term infants (median age, 19 weeks [interquartile range, 10–33 weeks]). The order of expression type (SIM/SEQ) and breast (left/right) was randomized. Results SIM expression yielded more milk ejections (p≤0.001) and greater amounts of milk at 2, 5, and 10 minutes (p≤0.01) and removed a greater total amount of milk (p≤0.01) and percentage of available milk (p<0.05) than SEQ expression. After SIM expression the cream content of both the overall (8.3% [p≤0.05]) and postexpression (12.6% [p≤0.001]) milk were greater. During SEQ expression, the breast expressed first had a shorter time to 50% and 80% of the total amount of milk than the breast expressed second (p≤0.05), but, overall, a similar percentage of available milk was removed from both breasts. Conclusions SIM expression stimulated more milk ejections and was a more efficient and efficacious method of expression, yielding milk with a higher energy content. PMID:23039397

  7. eQTL Mapping Using RNA-seq Data

    PubMed Central

    Hu, Yijuan

    2012-01-01

    As RNA-seq is replacing gene expression microarrays to assess genome-wide transcription abundance, gene expression Quantitative Trait Locus (eQTL) studies using RNA-seq have emerged. RNA-seq delivers two novel features that are important for eQTL studies. First, it provides information on allele-specific expression (ASE), which is not available from gene expression microarrays. Second, it generates unprecedentedly rich data to study RNA-isoform expression. In this paper, we review current methods for eQTL mapping using ASE and discuss some future directions. We also review existing works that use RNA-seq data to study RNA-isoform expression and we discuss the gaps between these works and isoform-specific eQTL mapping. PMID:23667399

  8. rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.

    PubMed

    Shi, Yang; Chinnaiyan, Arul M; Jiang, Hui

    2015-07-01

    High-throughput sequencing of transcriptomes (RNA-Seq) has become a powerful tool to study gene expression. Here we present an R package, rSeqNP, which implements a non-parametric approach to test for differential expression and splicing from RNA-Seq data. rSeqNP uses permutation tests to access statistical significance and can be applied to a variety of experimental designs. By combining information across isoforms, rSeqNP is able to detect more differentially expressed or spliced genes from RNA-Seq data. The R package with its source code and documentation are freely available at http://www-personal.umich.edu/∼jianghui/rseqnp/. jianghui@umich.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Functional assessment of human enhancer activities using whole-genome STARR-sequencing.

    PubMed

    Liu, Yuwen; Yu, Shan; Dhiman, Vineet K; Brunetti, Tonya; Eckart, Heather; White, Kevin P

    2017-11-20

    Genome-wide quantification of enhancer activity in the human genome has proven to be a challenging problem. Recent efforts have led to the development of powerful tools for enhancer quantification. However, because of genome size and complexity, these tools have yet to be applied to the whole human genome.  In the current study, we use a human prostate cancer cell line, LNCaP as a model to perform whole human genome STARR-seq (WHG-STARR-seq) to reliably obtain an assessment of enhancer activity. This approach builds upon previously developed STARR-seq in the fly genome and CapSTARR-seq techniques in targeted human genomic regions. With an improved library preparation strategy, our approach greatly increases the library complexity per unit of starting material, which makes it feasible and cost-effective to explore the landscape of regulatory activity in the much larger human genome. In addition to our ability to identify active, accessible enhancers located in open chromatin regions, we can also detect sequences with the potential for enhancer activity that are located in inaccessible, closed chromatin regions. When treated with the histone deacetylase inhibitor, Trichostatin A, genes nearby this latter class of enhancers are up-regulated, demonstrating the potential for endogenous functionality of these regulatory elements. WHG-STARR-seq provides an improved approach to current pipelines for analysis of high complexity genomes to gain a better understanding of the intricacies of transcriptional regulation.

  10. Morphological observation, RNA-Seq quantification, and expression profiling: novel insight into grafting-responsive carotenoid biosynthesis in watermelon grafted onto pumpkin rootstock.

    PubMed

    Liu, Guang; Yang, Xingping; Xu, Jinhua; Zhang, Man; Hou, Qian; Zhu, Lingli; Huang, Ying; Xiong, Aisheng

    2017-03-01

    Watermelon is an important and economical horticultural crop in China, where ~20% of the plants are grafted. The development of grafted watermelon fruit involves a diverse range of gene interactions that results in dynamic changes in fruit. However, the molecular mechanisms underlying grafting-induced fruit quality change are unclear. In the present study, we measured the lycopene content by high-performance liquid chromatography and used RNA-Seq (quantification) to perform a genome-wide transcript analysis of fruits from watermelon grafted onto pumpkin rootstock (pumpkin-grafted watermelon, PGW), self-grafted watermelon (SGW), and non-grafted watermelon (NGW). The results showed variation in the lycopene content in the flesh of PGW fruits, first increasing and then decreasing in the four stages, which was different from the trend in the flesh of NGW and SGW fruits. The transcriptome profiling data provided new information on the grafting-induced gene regulation of lycopene biosynthesis during fruit growth and development. The expression levels of 33 genes from 8 gene families (GGPS, PSY, PDS, ZDS, CRTISO, LCYb, LCYe, and CHY) related to lycopene biosynthesis, which play critical roles in fruit coloration and contribute significantly to fruit phytonutrient values, were monitored during the four periods of fruit development in watermelon. Compared with those of NGW and SGW, 14 genes were differentially expressed in PGW during fruit development, suggesting that these genes possibly help to mediate lycopene biosynthesis in grafted watermelon fruit. Our work provides some novel insights into grafting-responsive carotenoid metabolism and its potential roles during PGW fruit development and ripening. © The Author 2016. Published by Oxford University Press on behalf of the Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. oPOSSUM-3: Advanced Analysis of Regulatory Motif Over-Representation Across Genes or ChIP-Seq Datasets

    PubMed Central

    Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.

    2012-01-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536

  12. oPOSSUM-3: advanced analysis of regulatory motif over-representation across genes or ChIP-Seq datasets.

    PubMed

    Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W

    2012-09-01

    oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.

  13. Identification of molecular tumor markers in renal cell carcinomas with TFE3 protein expression by RNA sequencing.

    PubMed

    Pflueger, Dorothee; Sboner, Andrea; Storz, Martina; Roth, Jasmine; Compérat, Eva; Bruder, Elisabeth; Rubin, Mark A; Schraml, Peter; Moch, Holger

    2013-11-01

    TFE3 translocation renal cell carcinoma (tRCC) is defined by chromosomal translocations involving the TFE3 transcription factor at chromosome Xp11.2. Genetically proven TFE3 tRCCs have a broad histologic spectrum with overlapping features to other renal tumor subtypes. In this study, we aimed for characterizing RCC with TFE3 protein expression. Using next-generation whole transcriptome sequencing (RNA-Seq) as a discovery tool, we analyzed fusion transcripts, gene expression profile, and somatic mutations in frozen tissue of one TFE3 tRCC. By applying a computational analysis developed to call chimeric RNA molecules from paired-end RNA-Seq data, we confirmed the known TFE3 translocation. Its fusion partner SFPQ has already been described as fusion partner in tRCCs. In addition, an RNA read-through chimera between TMED6 and COG8 as well as MET and KDR (VEGFR2) point mutations were identified. An EGFR mutation, but no chromosomal rearrangements, was identified in a control group of five clear cell RCCs (ccRCCs). The TFE3 tRCC could be clearly distinguished from the ccRCCs by RNA-Seq gene expression measurements using a previously reported tRCC gene signature. In validation experiments using reverse transcription-PCR, TMED6-COG8 chimera expression was significantly higher in nine TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in 24 ccRCCs (P < .001) and 22 papillary RCCs (P < .05-.07). Immunohistochemical analysis of selected genes from the tRCC gene signature showed significantly higher eukaryotic translation elongation factor 1 alpha 2 (EEF1A2) and Contactin 3 (CNTN3) expression in 16 TFE3 translocated and six TFE3-expressing/non-translocated RCCs than in over 200 ccRCCs (P < .0001, both).

  14. YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research.

    PubMed

    Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei

    2015-01-01

    We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function 'Meta-analysis' is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Validation of Suitable Reference Genes for Expression Normalization in Echinococcus spp. Larval Stages

    PubMed Central

    Espínola, Sergio Martin; Ferreira, Henrique Bunselmeyer; Zaha, Arnaldo

    2014-01-01

    In recent years, a significant amount of sequence data (both genomic and transcriptomic) for Echinococcus spp. has been published, thereby facilitating the analysis of genes expressed during a specific stage or involved in parasite development. To perform a suitable gene expression quantification analysis, the use of validated reference genes is strongly recommended. Thus, the aim of this work was to identify suitable reference genes to allow reliable expression normalization for genes of interest in Echinococcus granulosus sensu stricto (s.s.) (G1) and Echinococcus ortleppi upon induction of the early pre-adult development. Untreated protoscoleces (PS) and pepsin-treated protoscoleces (PSP) from E. granulosus s.s. (G1) and E. ortleppi metacestode were used. The gene expression stability of eleven candidate reference genes (βTUB, NDUFV2, RPL13, TBP, CYP-1, RPII, EF-1α, βACT-1, GAPDH, ETIF4A-III and MAPK3) was assessed using geNorm, Normfinder, and RefFinder. Our qPCR data showed a good correlation with the recently published RNA-seq data. Regarding expression stability, EF-1α and TBP were the most stable genes for both species. Interestingly, βACT-1 (the most commonly used reference gene), and GAPDH and ETIF4A-III (previously identified as housekeeping genes) did not behave stably in our assay conditions. We propose the use of EF-1α as a reference gene for studies involving gene expression analysis in both PS and PSP experimental conditions for E. granulosus s.s. and E. ortleppi. To demonstrate its applicability, EF-1α was used as a normalizer gene in the relative quantification of transcripts from genes coding for antigen B subunits. The same EF-1α reference gene may be used in studies with other Echinococcus sensu lato species. This report validates suitable reference genes for species of class Cestoda, phylum Platyhelminthes, thus providing a foundation for further validation in other epidemiologically important cestode species, such as those from the Taenia genus. PMID:25014071

  16. Validation of suitable reference genes for expression normalization in Echinococcus spp. larval stages.

    PubMed

    Espínola, Sergio Martin; Ferreira, Henrique Bunselmeyer; Zaha, Arnaldo

    2014-01-01

    In recent years, a significant amount of sequence data (both genomic and transcriptomic) for Echinococcus spp. has been published, thereby facilitating the analysis of genes expressed during a specific stage or involved in parasite development. To perform a suitable gene expression quantification analysis, the use of validated reference genes is strongly recommended. Thus, the aim of this work was to identify suitable reference genes to allow reliable expression normalization for genes of interest in Echinococcus granulosus sensu stricto (s.s.) (G1) and Echinococcus ortleppi upon induction of the early pre-adult development. Untreated protoscoleces (PS) and pepsin-treated protoscoleces (PSP) from E. granulosus s.s. (G1) and E. ortleppi metacestode were used. The gene expression stability of eleven candidate reference genes (βTUB, NDUFV2, RPL13, TBP, CYP-1, RPII, EF-1α, βACT-1, GAPDH, ETIF4A-III and MAPK3) was assessed using geNorm, Normfinder, and RefFinder. Our qPCR data showed a good correlation with the recently published RNA-seq data. Regarding expression stability, EF-1α and TBP were the most stable genes for both species. Interestingly, βACT-1 (the most commonly used reference gene), and GAPDH and ETIF4A-III (previously identified as housekeeping genes) did not behave stably in our assay conditions. We propose the use of EF-1α as a reference gene for studies involving gene expression analysis in both PS and PSP experimental conditions for E. granulosus s.s. and E. ortleppi. To demonstrate its applicability, EF-1α was used as a normalizer gene in the relative quantification of transcripts from genes coding for antigen B subunits. The same EF-1α reference gene may be used in studies with other Echinococcus sensu lato species. This report validates suitable reference genes for species of class Cestoda, phylum Platyhelminthes, thus providing a foundation for further validation in other epidemiologically important cestode species, such as those from the Taenia genus.

  17. Digital gene expression for non-model organisms

    PubMed Central

    Hong, Lewis Z.; Li, Jun; Schmidt-Küntzel, Anne; Warren, Wesley C.; Barsh, Gregory S.

    2011-01-01

    Next-generation sequencing technologies offer new approaches for global measurements of gene expression but are mostly limited to organisms for which a high-quality assembled reference genome sequence is available. We present a method for gene expression profiling called EDGE, or EcoP15I-tagged Digital Gene Expression, based on ultra-high-throughput sequencing of 27-bp cDNA fragments that uniquely tag the corresponding gene, thereby allowing direct quantification of transcript abundance. We show that EDGE is capable of assaying for expression in >99% of genes in the genome and achieves saturation after 6–8 million reads. EDGE exhibits very little technical noise, reveals a large (106) dynamic range of gene expression, and is particularly suited for quantification of transcript abundance in non-model organisms where a high-quality annotated genome is not available. In a direct comparison with RNA-seq, both methods provide similar assessments of relative transcript abundance, but EDGE does better at detecting gene expression differences for poorly expressed genes and does not exhibit transcript length bias. Applying EDGE to laboratory mice, we show that a loss-of-function mutation in the melanocortin 1 receptor (Mc1r), recognized as a Mendelian determinant of yellow hair color in many different mammals, also causes reduced expression of genes involved in the interferon response. To illustrate the application of EDGE to a non-model organism, we examine skin biopsy samples from a cheetah (Acinonyx jubatus) and identify genes likely to control differences in the color of spotted versus non-spotted regions. PMID:21844123

  18. Nucleic and amino acid sequences relating to a novel transketolase, and methods for the expression thereof

    DOEpatents

    Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.

    2001-01-01

    cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.

  19. Monoterpene synthases from common sage (Salvia officinalis)

    DOEpatents

    Croteau, Rodney Bruce; Wise, Mitchell Lynn; Katahira, Eva Joy; Savage, Thomas Jonathan

    1999-01-01

    cDNAs encoding (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase from common sage (Salvia officinalis) have been isolated and sequenced, and the corresponding amino acid sequences has been determined. Accordingly, isolated DNA sequences (SEQ ID No:1; SEQ ID No:3 and SEQ ID No:5) are provided which code for the expression of (+)-bornyl diphosphate synthase (SEQ ID No:2), 1,8-cineole synthase (SEQ ID No:4) and (+)-sabinene synthase SEQ ID No:6), respectively, from sage (Salvia officinalis). In other aspects, replicable recombinant cloning vehicles are provided which code for (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase, or for a base sequence sufficiently complementary to at least a portion of (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding (+)-bornyl diphosphate synthase, 1,8-cineole synthase or (+)-sabinene synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant monoterpene synthases that may be used to facilitate their production, isolation and purification in significant amounts. Recombinant (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase may be used to obtain expression or enhanced expression of (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase in plants in order to enhance the production of monoterpenoids, or may be otherwise employed for the regulation or expression of (+)-bornyl diphosphate synthase, 1,8-cineole synthase and (+)-sabinene synthase, or the production of their products.

  20. Genome-Wide Profiling of Histone Modifications (H3K9me2 and H4K12ac) and Gene Expression in Rust (Uromyces appendiculatus) Inoculated Common Bean (Phaseolus vulgaris L.).

    PubMed

    Ayyappan, Vasudevan; Kalavacharla, Venu; Thimmapuram, Jyothi; Bhide, Ketaki P; Sripathi, Venkateswara R; Smolinski, Tomasz G; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce

    2015-01-01

    Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress.

  1. Genome-Wide Profiling of Histone Modifications (H3K9me2 and H4K12ac) and Gene Expression in Rust (Uromyces appendiculatus) Inoculated Common Bean (Phaseolus vulgaris L.)

    PubMed Central

    Thimmapuram, Jyothi; Bhide, Ketaki P.; Sripathi, Venkateswara R.; Smolinski, Tomasz G.; Manoharan, Muthusamy; Thurston, Yaqoob; Todd, Antonette; Kingham, Bruce

    2015-01-01

    Histone modifications such as methylation and acetylation play a significant role in controlling gene expression in unstressed and stressed plants. Genome-wide analysis of such stress-responsive modifications and genes in non-model crops is limited. We report the genome-wide profiling of histone methylation (H3K9me2) and acetylation (H4K12ac) in common bean (Phaseolus vulgaris L.) under rust (Uromyces appendiculatus) stress using two high-throughput approaches, chromatin immunoprecipitation sequencing (ChIP-Seq) and RNA sequencing (RNA-Seq). ChIP-Seq analysis revealed 1,235 and 556 histone methylation and acetylation responsive genes from common bean leaves treated with the rust pathogen at 0, 12 and 84 hour-after-inoculation (hai), while RNA-Seq analysis identified 145 and 1,763 genes differentially expressed between mock-inoculated and inoculated plants. The combined ChIP-Seq and RNA-Seq analyses identified some key defense responsive genes (calmodulin, cytochrome p450, chitinase, DNA Pol II, and LRR) and transcription factors (WRKY, bZIP, MYB, HSFB3, GRAS, NAC, and NMRA) in bean-rust interaction. Differential methylation and acetylation affected a large proportion of stress-responsive genes including resistant (R) proteins, detoxifying enzymes, and genes involved in ion flux and cell death. The genes identified were functionally classified using Gene Ontology (GO) and EuKaryotic Orthologous Groups (KOGs). The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified a putative pathway with ten key genes involved in plant-pathogen interactions. This first report of an integrated analysis of histone modifications and gene expression involved in the bean-rust interaction as reported here provides a comprehensive resource for other epigenomic regulation studies in non-model species under stress. PMID:26167691

  2. Polyester: simulating RNA-seq datasets with differential transcript expression.

    PubMed

    Frazee, Alyssa C; Jaffe, Andrew E; Langmead, Ben; Leek, Jeffrey T

    2015-09-01

    Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user. Polyester is freely available from Bioconductor (http://bioconductor.org/). jtleek@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing

    PubMed Central

    Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro

    2017-01-01

    Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100

  4. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine.

    PubMed

    Xu, Joshua; Gong, Binsheng; Wu, Leihong; Thakkar, Shraddha; Hong, Huixiao; Tong, Weida

    2016-03-15

    Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.

  5. Optimization of an RNA-Seq Differential Gene Expression Analysis Depending on Biological Replicate Number and Library Size

    PubMed Central

    Lamarre, Sophie; Frasse, Pierre; Zouine, Mohamed; Labourdette, Delphine; Sainderichin, Elise; Hu, Guojian; Le Berre-Anton, Véronique; Bouzayen, Mondher; Maza, Elie

    2018-01-01

    RNA-Seq is a widely used technology that allows an efficient genome-wide quantification of gene expressions for, for example, differential expression (DE) analysis. After a brief review of the main issues, methods and tools related to the DE analysis of RNA-Seq data, this article focuses on the impact of both the replicate number and library size in such analyses. While the main drawback of previous relevant studies is the lack of generality, we conducted both an analysis of a two-condition experiment (with eight biological replicates per condition) to compare the results with previous benchmark studies, and a meta-analysis of 17 experiments with up to 18 biological conditions, eight biological replicates and 100 million (M) reads per sample. As a global trend, we concluded that the replicate number has a larger impact than the library size on the power of the DE analysis, except for low-expressed genes, for which both parameters seem to have the same impact. Our study also provides new insights for practitioners aiming to enhance their experimental designs. For instance, by analyzing both the sensitivity and specificity of the DE analysis, we showed that the optimal threshold to control the false discovery rate (FDR) is approximately 2−r, where r is the replicate number. Furthermore, we showed that the false positive rate (FPR) is rather well controlled by all three studied R packages: DESeq, DESeq2, and edgeR. We also analyzed the impact of both the replicate number and library size on gene ontology (GO) enrichment analysis. Interestingly, we concluded that increases in the replicate number and library size tend to enhance the sensitivity and specificity, respectively, of the GO analysis. Finally, we recommend to RNA-Seq practitioners the production of a pilot data set to strictly analyze the power of their experimental design, or the use of a public data set, which should be similar to the data set they will obtain. For individuals working on tomato research, on the basis of the meta-analysis, we recommend at least four biological replicates per condition and 20 M reads per sample to be almost sure of obtaining about 1000 DE genes if they exist. PMID:29491871

  6. Falco: a quick and flexible single-cell RNA-seq processing framework on the cloud.

    PubMed

    Yang, Andrian; Troup, Michael; Lin, Peijie; Ho, Joshua W K

    2017-03-01

    Single-cell RNA-seq (scRNA-seq) is increasingly used in a range of biomedical studies. Nonetheless, current RNA-seq analysis tools are not specifically designed to efficiently process scRNA-seq data due to their limited scalability. Here we introduce Falco, a cloud-based framework to enable paralellization of existing RNA-seq processing pipelines using big data technologies of Apache Hadoop and Apache Spark for performing massively parallel analysis of large scale transcriptomic data. Using two public scRNA-seq datasets and two popular RNA-seq alignment/feature quantification pipelines, we show that the same processing pipeline runs 2.6-145.4 times faster using Falco than running on a highly optimized standalone computer. Falco also allows users to utilize low-cost spot instances of Amazon Web Services, providing a ∼65% reduction in cost of analysis. Falco is available via a GNU General Public License at https://github.com/VCCRI/Falco/. j.ho@victorchang.edu.au. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Alteration of development and gene expression induced by in ovo-nanoinjection of 3-hydroxybenzo[c]phenanthrene into Japanese medaka (Oryzias latipes) embryos.

    PubMed

    Chen, Kun; Tsutsumi, Yuki; Yoshitake, Shuhei; Qiu, Xuchun; Xu, Hai; Hashiguchi, Yasuyuki; Honda, Masato; Tashiro, Kosuke; Nakayama, Kei; Hano, Takeshi; Suzuki, Nobuo; Hayakawa, Kazuichi; Shimasaki, Yohei; Oshima, Yuji

    2017-01-01

    Benzo[c]phenanthrene (BcP) is a highly toxic polycyclic aromatic hydrocarbon (PAHs) found throughout the environment. In fish, it is metabolized to 3-hydroxybenzo[c]phenanthrene (3-OHBcP). In the present study, we observed the effects of 1nM 3-OHBcP on the development and gene expression of Japanese medaka (Oryzias latipes) embryos. Embryos were nanoinjected with the chemical after fertilization. Survival, developmental stage, and heart rate of the embryos were observed, and gene expression differences were quantified by messenger RNA sequencing (mRNA-Seq). The exposure to 1nM 3-OHBcP accelerated the development of medaka embryos on the 1st, 4th, and 6th days post fertilization (dpf), and increased heart rates significantly on the 5th dpf. Physical development differences of exposed medaka embryos were consistent with the gene expression profiles of the mRNA-Seq results for the 3rd dpf, which show that the expression of 780 genes differed significantly between the solvent control and 1nM 3-OHBcP exposure groups. The obvious expression changes in the exposure group were found for genes involved in organ formation (eye, muscle, heart), energy supply (ATPase and ATP synthase), and stress-response (heat shock protein genes). The acceleration of development and increased heart rate, which were consistent with the changes in mRNA expression, suggested that 3-OHBcP affects the development of medaka embryos. The observation on the developmental stages and heart beat, in ovo-nanoinjection and mRNA-Seq may be efficient tools to evaluate the effects of chemicals on embryos. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction.

    PubMed

    Zhang, Wenqian; Yu, Ying; Hertwig, Falk; Thierry-Mieg, Jean; Zhang, Wenwei; Thierry-Mieg, Danielle; Wang, Jian; Furlanello, Cesare; Devanarayan, Viswanath; Cheng, Jie; Deng, Youping; Hero, Barbara; Hong, Huixiao; Jia, Meiwen; Li, Li; Lin, Simon M; Nikolsky, Yuri; Oberthuer, André; Qing, Tao; Su, Zhenqiang; Volland, Ruth; Wang, Charles; Wang, May D; Ai, Junmei; Albanese, Davide; Asgharzadeh, Shahab; Avigad, Smadar; Bao, Wenjun; Bessarabova, Marina; Brilliant, Murray H; Brors, Benedikt; Chierici, Marco; Chu, Tzu-Ming; Zhang, Jibin; Grundy, Richard G; He, Min Max; Hebbring, Scott; Kaufman, Howard L; Lababidi, Samir; Lancashire, Lee J; Li, Yan; Lu, Xin X; Luo, Heng; Ma, Xiwen; Ning, Baitang; Noguera, Rosa; Peifer, Martin; Phan, John H; Roels, Frederik; Rosswog, Carolina; Shao, Susan; Shen, Jie; Theissen, Jessica; Tonini, Gian Paolo; Vandesompele, Jo; Wu, Po-Yen; Xiao, Wenzhong; Xu, Joshua; Xu, Weihong; Xuan, Jiekun; Yang, Yong; Ye, Zhan; Dong, Zirui; Zhang, Ke K; Yin, Ye; Zhao, Chen; Zheng, Yuanting; Wolfinger, Russell D; Shi, Tieliu; Malkas, Linda H; Berthold, Frank; Wang, Jun; Tong, Weida; Shi, Leming; Peng, Zhiyu; Fischer, Matthias

    2015-06-25

    Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice.

  9. Discovering functional modules by topic modeling RNA-Seq based toxicogenomic data.

    PubMed

    Yu, Ke; Gong, Binsheng; Lee, Mikyung; Liu, Zhichao; Xu, Joshua; Perkins, Roger; Tong, Weida

    2014-09-15

    Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq's full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into "documents", on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a "marker" topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the "marker" topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

  10. Transformation and model choice for RNA-seq co-expression analysis.

    PubMed

    Rau, Andrea; Maugis-Rabusseau, Cathy

    2018-05-01

    Although a large number of clustering algorithms have been proposed to identify groups of co-expressed genes from microarray data, the question of if and how such methods may be applied to RNA sequencing (RNA-seq) data remains unaddressed. In this work, we investigate the use of data transformations in conjunction with Gaussian mixture models for RNA-seq co-expression analyses, as well as a penalized model selection criterion to select both an appropriate transformation and number of clusters present in the data. This approach has the advantage of accounting for per-cluster correlation structures among samples, which can be strong in RNA-seq data. In addition, it provides a rigorous statistical framework for parameter estimation, an objective assessment of data transformations and number of clusters and the possibility of performing diagnostic checks on the quality and homogeneity of the identified clusters. We analyze four varied RNA-seq data sets to illustrate the use of transformations and model selection in conjunction with Gaussian mixture models. Finally, we propose a Bioconductor package coseq (co-expression of RNA-seq data) to facilitate implementation and visualization of the recommended RNA-seq co-expression analyses.

  11. Biological classification with RNA-Seq data: Can alternatively spliced transcript expression enhance machine learning classifier?

    PubMed

    Johnson, Nathan T; Dhroso, Andi; Hughes, Katelyn J; Korkin, Dmitry

    2018-06-25

    The extent to which the genes are expressed in the cell can be simplistically defined as a function of one or more factors of the environment, lifestyle, and genetics. RNA sequencing (RNA-Seq) is becoming a prevalent approach to quantify gene expression, and is expected to gain better insights to a number of biological and biomedical questions, compared to the DNA microarrays. Most importantly, RNA-Seq allows to quantify expression at the gene and alternative splicing isoform levels. However, leveraging the RNA-Seq data requires development of new data mining and analytics methods. Supervised machine learning methods are commonly used approaches for biological data analysis, and have recently gained attention for their applications to the RNA-Seq data. In this work, we assess the utility of supervised learning methods trained on RNA-Seq data for a diverse range of biological classification tasks. We hypothesize that the isoform-level expression data is more informative for biological classification tasks than the gene-level expression data. Our large-scale assessment is done through utilizing multiple datasets, organisms, lab groups, and RNA-Seq analysis pipelines. Overall, we performed and assessed 61 biological classification problems that leverage three independent RNA-Seq datasets and include over 2,000 samples that come from multiple organisms, lab groups, and RNA-Seq analyses. These 61 problems include predictions of the tissue type, sex, or age of the sample, healthy or cancerous phenotypes and, the pathological tumor stage for the samples from the cancerous tissue. For each classification problem, the performance of three normalization techniques and six machine learning classifiers was explored. We find that for every single classification problem, the isoform-based classifiers outperform or are comparable with gene expression based methods. The top-performing supervised learning techniques reached a near perfect classification accuracy, demonstrating the utility of supervised learning for RNA-Seq based data analysis. Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  12. Landscape of DNA Virus Associations across Human Malignant Cancers: Analysis of 3,775 Cases Using RNA-Seq

    PubMed Central

    Tannir, Nizar M.; Williams, Michelle D.; Chen, Yunxin; Yao, Hui; Zhang, Jianping; Thompson, Erika J.; Meric-Bernstam, Funda; Medeiros, L. Jeffrey; Weinstein, John N.

    2013-01-01

    Elucidation of tumor-DNA virus associations in many cancer types has enhanced our knowledge of fundamental oncogenesis mechanisms and provided a basis for cancer prevention initiatives. RNA-Seq is a novel tool to comprehensively assess such associations. We interrogated RNA-Seq data from 3,775 malignant neoplasms in The Cancer Genome Atlas database for the presence of viral sequences. Viral integration sites were also detected in expressed transcripts using a novel approach. The detection capacity of RNA-Seq was compared to available clinical laboratory data. Human papillomavirus (HPV) transcripts were detected using RNA-Seq analysis in head-and-neck squamous cell carcinoma, uterine endometrioid carcinoma, and squamous cell carcinoma of the lung. Detection of HPV by RNA-Seq correlated with detection by in situ hybridization and immunohistochemistry in squamous cell carcinoma tumors of the head and neck. Hepatitis B virus and Epstein-Barr virus (EBV) were detected using RNA-Seq in hepatocellular carcinoma and gastric carcinoma tumors, respectively. Integration sites of viral genes and oncogenes were detected in cancers harboring HPV or hepatitis B virus but not in EBV-positive gastric carcinoma. Integration sites of expressed viral transcripts frequently involved known coding areas of the host genome. No DNA virus transcripts were detected in acute myeloid leukemia, cutaneous melanoma, low- and high-grade gliomas of the brain, and adenocarcinomas of the breast, colon and rectum, lung, prostate, ovary, kidney, and thyroid. In conclusion, this study provides a large-scale overview of the landscape of DNA viruses in human malignant cancers. While further validation is necessary for specific cancer types, our findings highlight the utility of RNA-Seq in detecting tumor-associated DNA viruses and identifying viral integration sites that may unravel novel mechanisms of cancer pathogenesis. PMID:23740984

  13. Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.

    PubMed

    Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji

    2017-02-28

    High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Advantages of RNA-seq compared to RNA microarrays for transcriptome profiling of anterior cruciate ligament tears.

    PubMed

    Rai, Muhammad Farooq; Tycksen, Eric D; Sandell, Linda J; Brophy, Robert H

    2018-01-01

    Microarrays and RNA-seq are at the forefront of high throughput transcriptome analyses. Since these methodologies are based on different principles, there are concerns about the concordance of data between the two techniques. The concordance of RNA-seq and microarrays for genome-wide analysis of differential gene expression has not been rigorously assessed in clinically derived ligament tissues. To demonstrate the concordance between RNA-seq and microarrays and to assess potential benefits of RNA-seq over microarrays, we assessed differences in transcript expression in anterior cruciate ligament (ACL) tissues based on time-from-injury. ACL remnants were collected from patients with an ACL tear at the time of ACL reconstruction. RNA prepared from torn ACL remnants was subjected to Agilent microarrays (N = 24) and RNA-seq (N = 8). The correlation of biological replicates in RNA-seq and microarrays data was similar (0.98 vs. 0.97), demonstrating that each platform has high internal reproducibility. Correlations between the RNA-seq data and the individual microarrays were low, but correlations between the RNA-seq values and the geometric mean of the microarrays values were moderate. The cross-platform concordance for differentially expressed transcripts or enriched pathways was linearly correlated (r = 0.64). RNA-Seq was superior in detecting low abundance transcripts and differentiating biologically critical isoforms. Additional independent validation of transcript expression was undertaken using microfluidic PCR for selected genes. PCR data showed 100% concordance (in expression pattern) with RNA-seq and microarrays data. These findings demonstrate that RNA-seq has advantages over microarrays for transcriptome profiling of ligament tissues when available and affordable. Furthermore, these findings are likely transferable to other musculoskeletal tissues where tissue collection is challenging and cells are in low abundance. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 36:484-497, 2018. © 2017 Orthopaedic Research Society. Published by Wiley Periodicals, Inc.

  15. YM500: a small RNA sequencing (smRNA-seq) database for microRNA research

    PubMed Central

    Cheng, Wei-Chung; Chung, I-Fang; Huang, Tse-Shun; Chang, Shih-Ting; Sun, Hsing-Jen; Tsai, Cheng-Fong; Liang, Muh-Lii; Wong, Tai-Tong; Wang, Hsei-Wei

    2013-01-01

    MicroRNAs (miRNAs) are small RNAs ∼22 nt in length that are involved in the regulation of a variety of physiological and pathological processes. Advances in high-throughput small RNA sequencing (smRNA-seq), one of the next-generation sequencing applications, have reshaped the miRNA research landscape. In this study, we established an integrative database, the YM500 (http://ngs.ym.edu.tw/ym500/), containing analysis pipelines and analysis results for 609 human and mice smRNA-seq results, including public data from the Gene Expression Omnibus (GEO) and some private sources. YM500 collects analysis results for miRNA quantification, for isomiR identification (incl. RNA editing), for arm switching discovery, and, more importantly, for novel miRNA predictions. Wetlab validation on >100 miRNAs confirmed high correlation between miRNA profiling and RT-qPCR results (R = 0.84). This database allows researchers to search these four different types of analysis results via our interactive web interface. YM500 allows researchers to define the criteria of isomiRs, and also integrates the information of dbSNP to help researchers distinguish isomiRs from SNPs. A user-friendly interface is provided to integrate miRNA-related information and existing evidence from hundreds of sequencing datasets. The identified novel miRNAs and isomiRs hold the potential for both basic research and biotech applications. PMID:23203880

  16. Missing data and technical variability in single-cell RNA-sequencing experiments.

    PubMed

    Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A

    2017-11-06

    Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. DEsingle for detecting three types of differential expression in single-cell RNA-seq data.

    PubMed

    Miao, Zhun; Deng, Ke; Wang, Xiaowo; Zhang, Xuegong

    2018-04-24

    The excessive amount of zeros in single-cell RNA-seq data include "real" zeros due to the on-off nature of gene transcription in single cells and "dropout" zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy. The R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor's consideration now. zhangxg@tsinghua.edu.cn. Supplementary data are available at Bioinformatics online.

  18. Early Detection of NSCLC Using Stromal Markers in Peripheral Blood

    DTIC Science & Technology

    2016-09-01

    circulating myeloid cells, flow cytometry, RNA -sequencing, expression profiling. 3. ACCOMPLISHMENTS:  What were the major goals of the project...Subtask 2: Flow cytometry sorting of circulating myeloid cells. Subtask 3: RNA -Sequencing Subtask 4: RNA -seq data analysis Subtask 5: Feasible RT-PCR...accomplished the patient recruitment, flow cytometry sorting of circulating myeloid cells, RNA -sequencing of the samples. During the RNA - seq data analysis, we

  19. Massively parallel sequencing of forensic STRs and SNPs using the Illumina® ForenSeq™ DNA Signature Prep Kit on the MiSeq FGx™ Forensic Genomics System.

    PubMed

    Guo, Fei; Yu, Jiao; Zhang, Lu; Li, Jun

    2017-11-01

    The ForenSeq™ DNA Signature Prep Kit (ForenSeq Kit) is designed to detect more than 200 forensically relevant markers in a single reaction on the MiSeq FGx™ Forensic Genomics System (MiSeq FGx System), including Amelogenin, 27 autosomal short tandem repeats (A-STRs), 7 X chromosomal STRs (X-STRs), 24 Y chromosomal STRs (Y-STRs) and 94 identity-informative single nucleotide polymorphisms (iSNPs) with the option to contain 22 phenotypic-informative SNPs (pSNPs) and 56 ancestry-informative SNPs (aSNPs). In this study, we evaluated the MiSeq FGx System on three major parts: methodological optimization (DNA extraction, sample quantification, library normalization, diluted libraries concentration, and sample-to-cell arrangement), massively parallel sequencing (MPS) performance (depth of coverage, sequence coverage ratio, and allele coverage ratio), and ForenSeq Kit characteristics (repeatability and concordance, sensitivity, mixture, stability and case-type samples). Results showed that quantitative polymerase chain reaction (qPCR)-based sample quantification and library normalization and the appropriate number of pooled libraries and concentration of diluted libraries provided a greater level of MPS performance and repeatability. Repeatable and concordant genotypes were obtained by the ForenSeq Kit. Full profiles were obtained from ≥100pg input DNA for STRs and ≥200pg for SNPs. A sample with ≥5% minor contributors was considered as a mixture by imbalanced allele coverage ratio distribution, and full profiles from minor contributors were easily detected between 9:1 and 1:9 mixtures with known reference profiles. The ForenSeq Kit tolerated considerable concentrations of inhibitors like ≤200μM hematin and ≤50μg/ml humic acid, and >56% STR profiles and >88% SNP profiles were obtained from ≥200-bp degraded samples. Also, it was adapted to case-type samples. As a whole, the ForenSeq Kit is a well-performed, robust, reliable, reproducible and highly informative assay, and it can fully meet requirements for human identification. Further, sensitive QC indicator and automated sample comparison function in the ForenSeq™ Universal Analysis Software are quite helpful, so that we can concentrate on questionable genotypes and avoid tedious and time-consuming labor to maximum the time spent in data analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. A structured sparse regression method for estimating isoform expression level from multi-sample RNA-seq data.

    PubMed

    Zhang, L; Liu, X J

    2016-06-03

    With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.

  1. RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing.

    PubMed

    Gupta, Vikas; Estrada, April D; Blakley, Ivory; Reid, Rob; Patel, Ketan; Meyer, Mason D; Andersen, Stig Uggerhøj; Brown, Allan F; Lila, Mary Ann; Loraine, Ann E

    2015-01-01

    Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable the breeding of berry varieties with enhanced health benefits. Toward this end, we annotated a previously sequenced draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up-regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. Homology-based annotation using Blast2GO and InterPro assigned Gene Ontology terms to around 15,000 genes. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up- and down-regulation of metabolic pathway enzymes and transcriptional regulators. Analysis of RNA-seq alignments identified developmentally regulated alternative splicing, promoter use, and 3' end formation. We report genome sequence, gene models, functional annotations, and RNA-Seq expression data that provide an important new resource enabling high throughput studies in blueberry.

  2. Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.

    PubMed

    Yip, Shun H; Sham, Pak Chung; Wang, Junwen

    2018-02-21

    Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.

  3. Genome-wide analysis of endogenously expressed ZEB2 binding sites reveals inverse correlations between ZEB2 and GalNAc-transferase GALNT3 in human tumors.

    PubMed

    Balcik-Ercin, Pelin; Cetin, Metin; Yalim-Camci, Irem; Odabas, Gorkem; Tokay, Nurettin; Sayan, A Emre; Yagci, Tamer

    2018-03-07

    ZEB2 is a transcriptional repressor that regulates epithelial-to-mesenchymal transition (EMT) through binding to bipartite E-box motifs in gene regulatory regions. Despite the abundant presence of E-boxes within the human genome and the multiplicity of pathophysiological processes regulated during ZEB2-induced EMT, only a small fraction of ZEB2 targets has been identified so far. Hence, we explored genome-wide ZEB2 binding by chromatin immunoprecipitation-sequencing (ChIP-seq) under endogenous ZEB2 expression conditions. For ChIP-Seq we used an anti-ZEB2 monoclonal antibody, clone 6E5, in SNU398 hepatocellular carcinoma cells exhibiting a high endogenous ZEB2 expression. The ChIP-Seq targets were validated using ChIP-qPCR, whereas ZEB2-dependent expression of target genes was assessed by RT-qPCR and Western blotting in shRNA-mediated ZEB2 silenced SNU398 cells and doxycycline-induced ZEB2 overexpressing colorectal carcinoma DLD1 cells. Changes in target gene expression were also assessed using primary human tumor cDNA arrays in conjunction with RT-qPCR. Additional differential expression and correlation analyses were performed using expO and Human Protein Atlas datasets. Over 500 ChIP-Seq positive genes were annotated, and intervals related to these genes were found to include the ZEB2 binding motif CACCTG according to TOMTOM motif analysis in the MEME Suite database. Assessment of ZEB2-dependent expression of target genes in ZEB2-silenced SNU398 cells and ZEB2-induced DLD1 cells revealed that the GALNT3 gene serves as a ZEB2 target with the highest, but inversely correlated, expression level. Remarkably, GALNT3 also exhibited the highest enrichment in the ChIP-qPCR validation assays. Through the analyses of primary tumor cDNA arrays and expO datasets a significant differential expression and a significant inverse correlation between ZEB2 and GALNT3 expression were detected in most of the tumors. We also explored ZEB2 and GALNT3 protein expression using the Human Protein Atlas dataset and, again, observed an inverse correlation in all analyzed tumor types, except malignant melanoma. In contrast to a generally negative or weak ZEB2 expression, we found that most tumor tissues exhibited a strong or moderate GALNT3 expression. Our observation that ZEB2 negatively regulates a GalNAc-transferase (GALNT3) that is involved in O-glycosylation adds another layer of complexity to the role of ZEB2 in cancer progression and metastasis. Proteins glycosylated by GALNT3 may be exploited as novel diagnostics and/or therapeutic targets.

  4. Novel approaches for bioinformatic analysis of salivary RNA sequencing data for development.

    PubMed

    Kaczor-Urbanowicz, Karolina Elzbieta; Kim, Yong; Li, Feng; Galeev, Timur; Kitchen, Rob R; Gerstein, Mark; Koyano, Kikuye; Jeong, Sung-Hee; Wang, Xiaoyan; Elashoff, David; Kang, So Young; Kim, Su Mi; Kim, Kyoung; Kim, Sung; Chia, David; Xiao, Xinshu; Rozowsky, Joel; Wong, David T W

    2018-01-01

    Analysis of RNA sequencing (RNA-Seq) data in human saliva is challenging. Lack of standardization and unification of the bioinformatic procedures undermines saliva's diagnostic potential. Thus, it motivated us to perform this study. We applied principal pipelines for bioinformatic analysis of small RNA-Seq data of saliva of 98 healthy Korean volunteers including either direct or indirect mapping of the reads to the human genome using Bowtie1. Analysis of alignments to exogenous genomes by another pipeline revealed that almost all of the reads map to bacterial genomes. Thus, salivary exRNA has fundamental properties that warrant the design of unique additional steps while performing the bioinformatic analysis. Our pipelines can serve as potential guidelines for processing of RNA-Seq data of human saliva. Processing and analysis results of the experimental data generated by the exceRpt (v4.6.3) small RNA-seq pipeline (github.gersteinlab.org/exceRpt) are available from exRNA atlas (exrna-atlas.org). Alignment to exogenous genomes and their quantification results were used in this paper for the analyses of small RNAs of exogenous origin. dtww@ucla.edu. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Understanding the molecular mechanisms underlying the effects of light intensity on flavonoid production by RNA-seq analysis in Epimedium pseudowushanense B.L.Guo

    PubMed Central

    Chen, Haimei; Guo, Baolin; Liu, Chang

    2017-01-01

    Epimedium pseudowushanense B.L.Guo, a light-demanding shade herb, is used in traditional medicine to increase libido and strengthen muscles and bones. The recognition of the health benefits of Epimedium has increased its market demand. However, its resource recycling rate is low and environmentally dependent. Furthermore, its natural sources are endangered, further increasing prices. Commercial culture can address resource constraints of it.Understanding the effects of environmental factors on the production of its active components would improve the technology for cultivation and germplasm conservation. Here, we studied the effects of light intensities on the flavonoid production and revealed the molecular mechanism using RNA-seq analysis. Plants were exposed to five levels of light intensity through the periods of germination to flowering, the flavonoid contents were measured using HPLC. Quantification of epimedin A, epimedin B, epimedin C, and icariin showed that the flavonoid contents varied with different light intensity levels. And the largest amount of epimedin C was produced at light intensity level 4 (I4). Next, the leaves under the treatment of three light intensity levels (“L”, “M” and “H”) with the largest differences in the flavonoid content, were subjected to RNA-seq analysis. Transcriptome reconstruction identified 43,657 unigenes. All unigene sequences were annotated by searching against the Nr, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. In total, 4008, 5260, and 3591 significant differentially expressed genes (DEGs) were identified between the groups L vs. M, M vs. H and L vs. H. Particularly, twenty-one full-length genes involved in flavonoid biosynthesis were identified. The expression levels of the flavonol synthase, chalcone synthase genes were strongly associated with light-induced flavonoid abundance with the highest expression levels found in the H group. Furthermore, 65 transcription factors, including 31 FAR1, 17 MYB-related, 12 bHLH, and 5 WRKY, were differentially expressed after light induction. Finally, a model was proposed to explain the light-induced flavonoid production. This study provided valuable information to improve cultivation practices and produced the first comprehensive resource for E. pseudowushanense transcriptomes. PMID:28786984

  6. Understanding the molecular mechanisms underlying the effects of light intensity on flavonoid production by RNA-seq analysis in Epimedium pseudowushanense B.L.Guo.

    PubMed

    Pan, Junqian; Chen, Haimei; Guo, Baolin; Liu, Chang

    2017-01-01

    Epimedium pseudowushanense B.L.Guo, a light-demanding shade herb, is used in traditional medicine to increase libido and strengthen muscles and bones. The recognition of the health benefits of Epimedium has increased its market demand. However, its resource recycling rate is low and environmentally dependent. Furthermore, its natural sources are endangered, further increasing prices. Commercial culture can address resource constraints of it.Understanding the effects of environmental factors on the production of its active components would improve the technology for cultivation and germplasm conservation. Here, we studied the effects of light intensities on the flavonoid production and revealed the molecular mechanism using RNA-seq analysis. Plants were exposed to five levels of light intensity through the periods of germination to flowering, the flavonoid contents were measured using HPLC. Quantification of epimedin A, epimedin B, epimedin C, and icariin showed that the flavonoid contents varied with different light intensity levels. And the largest amount of epimedin C was produced at light intensity level 4 (I4). Next, the leaves under the treatment of three light intensity levels ("L", "M" and "H") with the largest differences in the flavonoid content, were subjected to RNA-seq analysis. Transcriptome reconstruction identified 43,657 unigenes. All unigene sequences were annotated by searching against the Nr, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. In total, 4008, 5260, and 3591 significant differentially expressed genes (DEGs) were identified between the groups L vs. M, M vs. H and L vs. H. Particularly, twenty-one full-length genes involved in flavonoid biosynthesis were identified. The expression levels of the flavonol synthase, chalcone synthase genes were strongly associated with light-induced flavonoid abundance with the highest expression levels found in the H group. Furthermore, 65 transcription factors, including 31 FAR1, 17 MYB-related, 12 bHLH, and 5 WRKY, were differentially expressed after light induction. Finally, a model was proposed to explain the light-induced flavonoid production. This study provided valuable information to improve cultivation practices and produced the first comprehensive resource for E. pseudowushanense transcriptomes.

  7. Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs

    PubMed Central

    LeGault, Laura H.; Dewey, Colin N.

    2013-01-01

    Motivation: Alternative splicing and other processes that allow for different transcripts to be derived from the same gene are significant forces in the eukaryotic cell. RNA-Seq is a promising technology for analyzing alternative transcripts, as it does not require prior knowledge of transcript structures or genome sequences. However, analysis of RNA-Seq data in the presence of genes with large numbers of alternative transcripts is currently challenging due to efficiency, identifiability and representation issues. Results: We present RNA-Seq models and associated inference algorithms based on the concept of probabilistic splice graphs, which alleviate these issues. We prove that our models are often identifiable and demonstrate that our inference methods for quantification and differential processing detection are efficient and accurate. Availability: Software implementing our methods is available at http://deweylab.biostat.wisc.edu/psginfer. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23846746

  8. The development of functional mapping by three sex-related loci on the third whorl of different sex types of Carica papaya L.

    PubMed Central

    Lin, Hui-Jun; Viswanath, Kotapati Kasi; Lin, Chih-Peng; Chang, Bill Chia-Han; Chiu, Pei-Hsun; Chiu, Chan-Tai; Wang, Ren-Huang; Chin, Shih-Wen; Chen, Fure-Chyi

    2018-01-01

    Carica papaya L. is an important economic crop worldwide and is used as a model plant for sex-determination research. To study the different flower sex types, we screened sex-related genes using alternative splicing sequences (AS-seqs) from a transcriptome database of the three flower sex types, i.e., males, females, and hermaphrodites, established at 28 days before flowering using 15 bacterial artificial chromosomes (BACs) of C. papaya L. After screening, the cDNA regions of the three sex-related loci, including short vegetative phase-like (CpSVPL), the chromatin assembly factor 1 subunit A-like (CpCAF1AL), and the somatic embryogenesis receptor kinase (CpSERK), which contained eight sex-related single-nucleotide polymorphisms (SNPs) from the different sex types of C. papaya L., were genotyped using high-resolution melting (HRM). The three loci were examined regarding the profiles of the third whorl, as described below. CpSVPL, which had one SNP associated with the three sex genotypes, was highly expressed in the male and female sterile flowers (abnormal hermaphrodite flowers) that lacked the fourth whorl structure. CpCAF1AL, which had three SNPs associated with the male genotype, was highly expressed in male and normal hermaphrodite flowers, and had no AS-seqs, whereas it exhibited low expression and an AS-seqs in intron 11 in abnormal hermaphrodite flowers. Conversely, carpellate flowers (abnormal hermaphrodite flowers) showed low expression of CpSVPL and AS-seqs in introns 5, 6, and 7 of CpSERK, which contained four SNPs associated with the female genotype. Specifically, the CpSERK and CpCAF1AL loci exhibited no AS-seq expression in the third whorl of the male and normal hermaphrodite flowers, respectively, and variance in the AS-seq expression of all other types of flowers. Functional mapping of the third whorl of normal hermaphrodites indicated no AS-seq expression in CpSERK, low CpSVPL expression, and, for CpCAF1AL, high expression and no AS-seq expression on XYh-type chromosomes. PMID:29566053

  9. Epigenetic Regulation of Hormone-dependent Plant Growth Processes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ecker, Joseph Robert

    2016-11-18

    Impact of EIN6, EEN and ethylene on the H3K27me3 dynamics in Arabidopsis: To assess the dynamic responsiveness of H3K27me3 levels to ethylene and how this might affect ethylene-induced gene expression, we plan to perform H3K27me3 ChIP-seq and RNA- seq experiments in parallel with etiolated seedlings in the absence and presence of ethylene. Further implementation of ein6, een and ein6een mutants will visualize how the H3K27me3 landscape (-/+ET) is altered when H3K27me3 demethylation and/or INO80-mediated chromatin remodeling is compromised. Additional ChIP-seq analyses with EIN6 will show if ethylene- induced H3K27me3 removal at certain genes is always accompanied by the presence ofmore » EIN6.« less

  10. Beta-Poisson model for single-cell RNA-seq data analyses.

    PubMed

    Vu, Trung Nghia; Wills, Quin F; Kalari, Krishna R; Niu, Nifang; Wang, Liewei; Rantalainen, Mattias; Pawitan, Yudi

    2016-07-15

    Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ∼90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in > 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC CONTACT: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. LocExpress: a web server for efficiently estimating expression of novel transcripts.

    PubMed

    Hou, Mei; Tian, Feng; Jiang, Shuai; Kong, Lei; Yang, Dechang; Gao, Ge

    2016-12-22

    The temporal and spatial-specific expression pattern of a transcript in multiple tissues and cell types can indicate key clues about its function. While several gene atlas available online as pre-computed databases for known gene models, it's still challenging to get expression profile for previously uncharacterized (i.e. novel) transcripts efficiently. Here we developed LocExpress, a web server for efficiently estimating expression of novel transcripts across multiple tissues and cell types in human (20 normal tissues/cells types and 14 cell lines) as well as in mouse (24 normal tissues/cell types and nine cell lines). As a wrapper to RNA-Seq quantification algorithm, LocExpress efficiently reduces the time cost by making abundance estimation calls increasingly within the minimum spanning bundle region of input transcripts. For a given novel gene model, such local context-oriented strategy allows LocExpress to estimate its FPKMs in hundreds of samples within minutes on a standard Linux box, making an online web server possible. To the best of our knowledge, LocExpress is the only web server to provide nearly real-time expression estimation for novel transcripts in common tissues and cell types. The server is publicly available at http://loc-express.cbi.pku.edu.cn .

  12. Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

    PubMed Central

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

    2015-01-01

    ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644

  13. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    PubMed

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB ( http://paramecium.i2bc.paris-saclay.fr ). TrUC software is freely distributed under a GNU GPL v3 licence ( https://github.com/oarnaiz/TrUC ).

  14. An Integrated Approach for RNA-seq Data Normalization.

    PubMed

    Yang, Shengping; Mercante, Donald E; Zhang, Kun; Fang, Zhide

    2016-01-01

    DNA copy number alteration is common in many cancers. Studies have shown that insertion or deletion of DNA sequences can directly alter gene expression, and significant correlation exists between DNA copy number and gene expression. Data normalization is a critical step in the analysis of gene expression generated by RNA-seq technology. Successful normalization reduces/removes unwanted nonbiological variations in the data, while keeping meaningful information intact. However, as far as we know, no attempt has been made to adjust for the variation due to DNA copy number changes in RNA-seq data normalization. In this article, we propose an integrated approach for RNA-seq data normalization. Comparisons show that the proposed normalization can improve power for downstream differentially expressed gene detection and generate more biologically meaningful results in gene profiling. In addition, our findings show that due to the effects of copy number changes, some housekeeping genes are not always suitable internal controls for studying gene expression. Using information from DNA copy number, integrated approach is successful in reducing noises due to both biological and nonbiological causes in RNA-seq data, thus increasing the accuracy of gene profiling.

  15. expVIP: a Customizable RNA-seq Data Analysis and Visualization Platform1[OPEN

    PubMed Central

    2016-01-01

    The majority of transcriptome sequencing (RNA-seq) expression studies in plants remain underutilized and inaccessible due to the use of disparate transcriptome references and the lack of skills and resources to analyze and visualize these data. We have developed expVIP, an expression visualization and integration platform, which allows easy analysis of RNA-seq data combined with an intuitive and interactive interface. Users can analyze public and user-specified data sets with minimal bioinformatics knowledge using the expVIP virtual machine. This generates a custom Web browser to visualize, sort, and filter the RNA-seq data and provides outputs for differential gene expression analysis. We demonstrate expVIP’s suitability for polyploid crops and evaluate its performance across a range of biologically relevant scenarios. To exemplify its use in crop research, we developed a flexible wheat (Triticum aestivum) expression browser (www.wheat-expression.com) that can be expanded with user-generated data in a local virtual machine environment. The open-access expVIP platform will facilitate the analysis of gene expression data from a wide variety of species by enabling the easy integration, visualization, and comparison of RNA-seq data across experiments. PMID:26869702

  16. RNA-Seq profiling of single bovine oocyte transcript abundance and its modulation by cytoplasmic polyadenylation.

    PubMed

    Reyes, Juan M; Chitwood, James L; Ross, Pablo J

    2015-02-01

    Molecular changes occurring during mammalian oocyte maturation are partly regulated by cytoplasmic polyadenylation (CP) and affect oocyte quality, yet the extent of CP activity during oocyte maturation remains unknown. Single bovine oocyte RNA sequencing (RNA-Seq) was performed to examine changes in transcript abundance during in vitro oocyte maturation in cattle. Polyadenylated RNA from individual germinal-vesicle and metaphase-II oocytes was amplified and processed for Illumina sequencing, producing approximately 30 million reads per replicate for each sample type. A total of 10,494 genes were found to be expressed, of which 2,455 were differentially expressed (adjusted P < 0.05 and fold change >2) between stages, with 503 and 1,952 genes respectively increasing and decreasing in abundance. Differentially expressed genes with complete 3'-untranslated-region sequence (279 increasing and 918 decreasing in polyadenylated transcript abundance) were examined for the presence, position, and distribution of motifs mediating CP, revealing enrichment (85%) and lack thereof (18%) in up- and down-regulated genes, respectively. Examination of total and polyadenylated RNA abundance by quantitative PCR validated these RNA-Seq findings. The observed increases in polyadenylated transcript abundance within the RNA-Seq data are likely due to CP, providing novel insight into targeted transcripts and resultant differential gene expression profiles that contribute to oocyte maturation. © 2015 Wiley Periodicals, Inc.

  17. Full-length single-cell RNA-seq applied to a viral human cancer: applications to HPV expression and splicing analysis in HeLa S3 cells.

    PubMed

    Wu, Liang; Zhang, Xiaolong; Zhao, Zhikun; Wang, Ling; Li, Bo; Li, Guibo; Dean, Michael; Yu, Qichao; Wang, Yanhui; Lin, Xinxin; Rao, Weijian; Mei, Zhanlong; Li, Yang; Jiang, Runze; Yang, Huan; Li, Fuqiang; Xie, Guoyun; Xu, Liqin; Wu, Kui; Zhang, Jie; Chen, Jianghao; Wang, Ting; Kristiansen, Karsten; Zhang, Xiuqing; Li, Yingrui; Yang, Huanming; Wang, Jian; Hou, Yong; Xu, Xun

    2015-01-01

    Viral infection causes multiple forms of human cancer, and HPV infection is the primary factor in cervical carcinomas. Recent single-cell RNA-seq studies highlight the tumor heterogeneity present in most cancers, but virally induced tumors have not been studied. HeLa is a well characterized HPV+ cervical cancer cell line. We developed a new high throughput platform to prepare single-cell RNA on a nanoliter scale based on a customized microwell chip. Using this method, we successfully amplified full-length transcripts of 669 single HeLa S3 cells and 40 of them were randomly selected to perform single-cell RNA sequencing. Based on these data, we obtained a comprehensive understanding of the heterogeneity of HeLa S3 cells in gene expression, alternative splicing and fusions. Furthermore, we identified a high diversity of HPV-18 expression and splicing at the single-cell level. By co-expression analysis we identified 283 E6, E7 co-regulated genes, including CDC25, PCNA, PLK4, BUB1B and IRF1 known to interact with HPV viral proteins. Our results reveal the heterogeneity of a virus-infected cell line. It not only provides a transcriptome characterization of HeLa S3 cells at the single cell level, but is a demonstration of the power of single cell RNA-seq analysis of virally infected cells and cancers.

  18. BM-Map: Bayesian Mapping of Multireads for Next-Generation Sequencing Data

    PubMed Central

    Ji, Yuan; Xu, Yanxun; Zhang, Qiong; Tsui, Kam-Wah; Yuan, Yuan; Norris, Clift; Liang, Shoudan; Liang, Han

    2011-01-01

    Summary Next-generation sequencing (NGS) technology generates millions of short reads, which provide valuable information for various aspects of cellular activities and biological functions. A key step in NGS applications (e.g., RNA-Seq) is to map short reads to correct genomic locations within the source genome. While most reads are mapped to a unique location, a significant proportion of reads align to multiple genomic locations with equal or similar numbers of mismatches; these are called multireads. The ambiguity in mapping the multireads may lead to bias in downstream analyses. Currently, most practitioners discard the multireads in their analysis, resulting in a loss of valuable information, especially for the genes with similar sequences. To refine the read mapping, we develop a Bayesian model that computes the posterior probability of mapping a multiread to each competing location. The probabilities are used for downstream analyses, such as the quantification of gene expression. We show through simulation studies and RNA-Seq analysis of real life data that the Bayesian method yields better mapping than the current leading methods. We provide a C++ program for downloading that is being packaged into a user-friendly software. PMID:21517792

  19. Transcription profiling using RNA-Seq demonstrates expression differences in the body walls of juvenile albino and normal sea cucumbers Apostichopus japonicus

    NASA Astrophysics Data System (ADS)

    Ma, Deyou; Yang, Hongsheng; Sun, Lina; Chen, Muyan

    2014-01-01

    Sea cucumbers Apostichopus japonicus are one of the most important aquaculture species in China. Their normal body color is black to fit their surroundings. Wild albinos are rare and hard to breed. To understand the differences between albino and normal (control) sea cucumbers at the transcriptional level, we sequenced the transcriptomes in their body-wall tissues using RNA-Seq high-throughput sequencing. Approximately 4.876 million (M) and 4.884 M 200-nucleotide-long cDNA reads were produced in the cDNA libraries derived from the body walls of albino and control samples, respectively. A total of 9 561 (46.89%) putative genes were identified from among the RNA-Seq reads in both libraries. After filtering, 837 significantly differentially regulated genes were identified in the albino library compared with in the control library, and 3.6% of the differentially expressed genes (DEGs) were found to have changed those more than five-fold. The expression levels of 10 DEGs were checked by real-time PCR and the results were in full accord with the RNA-Seq expression trends, although the amplitude of the differences in expression levels was lower in all cases. A series of pathways were significantly enriched for the DEGs. These pathways were closely related to phagocytosis, the complement and coagulation cascades, apoptosis-related diseases, cytokine-cytokine receptor interaction, and cell adhesion. The differences in gene expression and enriched pathways between the albino and control sea cucumbers offer control targets for cultivating excellent albino A. japonicus strains in the future.

  20. Predicting gene regulatory networks of soybean nodulation from RNA-Seq transcriptome data.

    PubMed

    Zhu, Mingzhu; Dahmen, Jeremy L; Stacey, Gary; Cheng, Jianlin

    2013-09-22

    High-throughput RNA sequencing (RNA-Seq) is a revolutionary technique to study the transcriptome of a cell under various conditions at a systems level. Despite the wide application of RNA-Seq techniques to generate experimental data in the last few years, few computational methods are available to analyze this huge amount of transcription data. The computational methods for constructing gene regulatory networks from RNA-Seq expression data of hundreds or even thousands of genes are particularly lacking and urgently needed. We developed an automated bioinformatics method to predict gene regulatory networks from the quantitative expression values of differentially expressed genes based on RNA-Seq transcriptome data of a cell in different stages and conditions, integrating transcriptional, genomic and gene function data. We applied the method to the RNA-Seq transcriptome data generated for soybean root hair cells in three different development stages of nodulation after rhizobium infection. The method predicted a soybean nodulation-related gene regulatory network consisting of 10 regulatory modules common for all three stages, and 24, 49 and 70 modules separately for the first, second and third stage, each containing both a group of co-expressed genes and several transcription factors collaboratively controlling their expression under different conditions. 8 of 10 common regulatory modules were validated by at least two kinds of validations, such as independent DNA binding motif analysis, gene function enrichment test, and previous experimental data in the literature. We developed a computational method to reliably reconstruct gene regulatory networks from RNA-Seq transcriptome data. The method can generate valuable hypotheses for interpreting biological data and designing biological experiments such as ChIP-Seq, RNA interference, and yeast two hybrid experiments.

  1. BrAD-seq: Breath Adapter Directional sequencing: a streamlined, ultra-simple and fast library preparation protocol for strand specific mRNA library construction.

    PubMed

    Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R

    2015-01-01

    Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.

  2. SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

    PubMed

    Johnson, Benjamin K; Scholz, Matthew B; Teal, Tracy K; Abramovitch, Robert B

    2016-02-04

    Many tools exist in the analysis of bacterial RNA sequencing (RNA-seq) transcriptional profiling experiments to identify differentially expressed genes between experimental conditions. Generally, the workflow includes quality control of reads, mapping to a reference, counting transcript abundance, and statistical tests for differentially expressed genes. In spite of the numerous tools developed for each component of an RNA-seq analysis workflow, easy-to-use bacterially oriented workflow applications to combine multiple tools and automate the process are lacking. With many tools to choose from for each step, the task of identifying a specific tool, adapting the input/output options to the specific use-case, and integrating the tools into a coherent analysis pipeline is not a trivial endeavor, particularly for microbiologists with limited bioinformatics experience. To make bacterial RNA-seq data analysis more accessible, we developed a Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis (SPARTA). SPARTA is a reference-based bacterial RNA-seq analysis workflow application for single-end Illumina reads. SPARTA is turnkey software that simplifies the process of analyzing RNA-seq data sets, making bacterial RNA-seq analysis a routine process that can be undertaken on a personal computer or in the classroom. The easy-to-install, complete workflow processes whole transcriptome shotgun sequencing data files by trimming reads and removing adapters, mapping reads to a reference, counting gene features, calculating differential gene expression, and, importantly, checking for potential batch effects within the data set. SPARTA outputs quality analysis reports, gene feature counts and differential gene expression tables and scatterplots. SPARTA provides an easy-to-use bacterial RNA-seq transcriptional profiling workflow to identify differentially expressed genes between experimental conditions. This software will enable microbiologists with limited bioinformatics experience to analyze their data and integrate next generation sequencing (NGS) technologies into the classroom. The SPARTA software and tutorial are available at sparta.readthedocs.org.

  3. A Guide for Designing and Analyzing RNA-Seq Data.

    PubMed

    Chatterjee, Aniruddha; Ahn, Antonio; Rodger, Euan J; Stockwell, Peter A; Eccles, Michael R

    2018-01-01

    The identity of a cell or an organism is at least in part defined by its gene expression and therefore analyzing gene expression remains one of the most frequently performed experimental techniques in molecular biology. The development of the RNA-Sequencing (RNA-Seq) method allows an unprecedented opportunity to analyze expression of protein-coding, noncoding RNA and also de novo transcript assembly of a new species or organism. However, the planning and design of RNA-Seq experiments has important implications for addressing the desired biological question and maximizing the value of the data obtained. In addition, RNA-Seq generates a huge volume of data and accurate analysis of this data involves several different steps and choices of tools. This can be challenging and overwhelming, especially for bench scientists. In this chapter, we describe an entire workflow for performing RNA-Seq experiments. We describe critical aspects of wet lab experiments such as RNA isolation, library preparation and the initial design of an experiment. Further, we provide a step-by-step description of the bioinformatics workflow for different steps involved in RNA-Seq data analysis. This includes power calculations, setting up a computational environment, acquisition and processing of publicly available data if desired, quality control measures, preprocessing steps for the raw data, differential expression analysis, and data visualization. We particularly mention important considerations for each step to provide a guide for designing and analyzing RNA-Seq data.

  4. The Vigna unguiculata Gene Expression Atlas (VuGEA) from de novo assembly and quantification of RNA-seq data provides insights into seed maturation mechanisms.

    PubMed

    Yao, Shaolun; Jiang, Chuan; Huang, Ziyue; Torres-Jerez, Ivone; Chang, Junil; Zhang, Heng; Udvardi, Michael; Liu, Renyi; Verdier, Jerome

    2016-10-01

    Legume research and cultivar development are important for sustainable food production, especially of high-protein seed. Thanks to the development of deep-sequencing technologies, crop species have been taken to the front line, even without completion of their genome sequences. Black-eyed pea (Vigna unguiculata) is a legume species widely grown in semi-arid regions, which has high potential to provide stable seed protein production in a broad range of environments, including drought conditions. The black-eyed pea reference genotype has been used to generate a gene expression atlas of the major plant tissues (i.e. leaf, root, stem, flower, pod and seed), with a developmental time series for pods and seeds. From these various organs, 27 cDNA libraries were generated and sequenced, resulting in more than one billion reads. Following filtering, these reads were de novo assembled into 36 529 transcript sequences that were annotated and quantified across the different tissues. A set of 24 866 unique transcript sequences, called Unigenes, was identified. All the information related to transcript identification, annotation and quantification were stored into a gene expression atlas webserver (http://vugea.noble.org), providing a user-friendly interface and necessary tools to analyse transcript expression in black-eyed pea organs and to compare data with other legume species. Using this gene expression atlas, we inferred details of molecular processes that are active during seed development, and identified key putative regulators of seed maturation. Additionally, we found evidence for conservation of regulatory mechanisms involving miRNA in plant tissues subjected to drought and seeds undergoing desiccation. © 2016 The Authors. The Plant Journal published by Society for Experimental Biology and John Wiley & Sons Ltd.

  5. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae

    PubMed Central

    Nookaew, Intawat; Papini, Marta; Pornputtapong, Natapol; Scalcinati, Gionata; Fagerberg, Linn; Uhlén, Matthias; Nielsen, Jens

    2012-01-01

    RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach. High reproducibility among biological replicates (correlation ≥0.99) and high consistency between the two platforms for analysis of gene expression levels (correlation ≥0.91) are reported. The results from differential gene expression identification derived from the different statistical methods, as well as their integrated analysis results based on gene ontology annotation are in good agreement. Overall, our study provides a useful and comprehensive comparison between the two platforms (RNA-seq and microrrays) for gene expression analysis and addresses the contribution of the different steps involved in the analysis of RNA-seq data. PMID:22965124

  6. GC-Content Normalization for RNA-Seq Data

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof. Results We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq. Conclusions Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes. PMID:22177264

  7. Identification of innate lymphoid cells in single-cell RNA-Seq data.

    PubMed

    Suffiotti, Madeleine; Carmona, Santiago J; Jandus, Camilla; Gfeller, David

    2017-07-01

    Innate lymphoid cells (ILCs) consist of natural killer (NK) cells and non-cytotoxic ILCs that are broadly classified into ILC1, ILC2, and ILC3 subtypes. These cells recently emerged as important early effectors of innate immunity for their roles in tissue homeostasis and inflammation. Over the last few years, ILCs have been extensively studied in mouse and human at the functional and molecular level, including gene expression profiling. However, sorting ILCs with flow cytometry for gene expression analysis is a delicate and time-consuming process. Here we propose and validate a novel framework for studying ILCs at the transcriptomic level using single-cell RNA-Seq data. Our approach combines unsupervised clustering and a new cell type classifier trained on mouse ILC gene expression data. We show that this approach can accurately identify different ILCs, especially ILC2 cells, in human lymphocyte single-cell RNA-Seq data. Our new model relies only on genes conserved across vertebrates, thereby making it in principle applicable in any vertebrate species. Considering the rapid increase in throughput of single-cell RNA-Seq technology, our work provides a computational framework for studying ILC2 cells in single-cell transcriptomic data and may help exploring their conservation in distant vertebrate species.

  8. RNA-seq Analysis of Early Hepatic Response to Handling and Confinement Stress in Rainbow Trout

    PubMed Central

    Liu, Sixin; Gao, Guangtu; Palti, Yniv; Cleveland, Beth M.; Weber, Gregory M.; Rexroad, Caird E.

    2014-01-01

    Fish under intensive rearing conditions experience various stressors which have negative impacts on survival, growth, reproduction and fillet quality. Identifying and characterizing the molecular mechanisms underlying stress responses will facilitate the development of strategies that aim to improve animal welfare and aquaculture production efficiency. In this study, we used RNA-seq to identify transcripts which are differentially expressed in the rainbow trout liver in response to handling and confinement stress. These stressors were selected due to their relevance in aquaculture production. Total RNA was extracted from the livers of individual fish in five tanks having eight fish each, including three tanks of fish subjected to a 3 hour handling and confinement stress and two control tanks. Equal amount of total RNA of six individual fish was pooled by tank to create five RNA-seq libraries which were sequenced in one lane of Illumina HiSeq 2000. Three sequencing runs were conducted to obtain a total of 491,570,566 reads which were mapped onto the previously generated stress reference transcriptome to identify 316 differentially expressed transcripts (DETs). Twenty one DETs were selected for qPCR to validate the RNA-seq approach. The fold changes in gene expression identified by RNA-seq and qPCR were highly correlated (R2 = 0.88). Several gene ontology terms including transcription factor activity and biological process such as glucose metabolic process were enriched among these DETs. Pathways involved in response to handling and confinement stress were implicated by mapping the DETs to reference pathways in the KEGG database. Accession Numbers Raw RNA-seq reads have been submitted to the NCBI Short Read Archive under accession number SRP022881. Customized Perl Scripts All customized scripts described in this paper are available from Dr. Guangtu Gao or the corresponding author. PMID:24558395

  9. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

    PubMed Central

    Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior

    2012-01-01

    Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036

  10. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    PubMed Central

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484

  11. Gene expression and splicing alterations analyzed by high throughput RNA sequencing of chronic lymphocytic leukemia specimens.

    PubMed

    Liao, Wei; Jordaan, Gwen; Nham, Phillipp; Phan, Ryan T; Pelegrini, Matteo; Sharma, Sanjai

    2015-10-16

    To determine differentially expressed and spliced RNA transcripts in chronic lymphocytic leukemia specimens a high throughput RNA-sequencing (HTS RNA-seq) analysis was performed. Ten CLL specimens and five normal peripheral blood CD19+ B cells were analyzed by HTS RNA-seq. The library preparation was performed with Illumina TrueSeq RNA kit and analyzed by Illumina HiSeq 2000 sequencing system. An average of 48.5 million reads for B cells, and 50.6 million reads for CLL specimens were obtained with 10396 and 10448 assembled transcripts for normal B cells and primary CLL specimens respectively. With the Cuffdiff analysis, 2091 differentially expressed genes (DEG) between B cells and CLL specimens based on FPKM (fragments per kilobase of transcript per million reads and false discovery rate, FDR q < 0.05, fold change >2) were identified. Expression of selected DEGs (n = 32) with up regulated and down regulated expression in CLL from RNA-seq data were also analyzed by qRT-PCR in a test cohort of CLL specimens. Even though there was a variation in fold expression of DEG genes between RNA-seq and qRT-PCR; more than 90 % of analyzed genes were validated by qRT-PCR analysis. Analysis of RNA-seq data for splicing alterations in CLL and B cells was performed by Multivariate Analysis of Transcript Splicing (MATS analysis). Skipped exon was the most frequent splicing alteration in CLL specimens with 128 significant events (P-value <0.05, minimum inclusion level difference >0.1). The RNA-seq analysis of CLL specimens identifies novel DEG and alternatively spliced genes that are potential prognostic markers and therapeutic targets. High level of validation by qRT-PCR for a number of DEG genes supports the accuracy of this analysis. Global comparison of transcriptomes of B cells, IGVH non-mutated CLL (U-CLL) and mutated CLL specimens (M-CLL) with multidimensional scaling analysis was able to segregate CLL and B cell transcriptomes but the M-CLL and U-CLL transcriptomes were indistinguishable. The analysis of HTS RNA-seq data to identify alternative splicing events and other genetic abnormalities specific to CLL is an added advantage of RNA-seq that is not feasible with other genome wide analysis.

  12. RNA-Seq workflow: gene-level exploratory analysis and differential expression

    PubMed Central

    Love, Michael I.; Anders, Simon; Kim, Vladislav; Huber, Wolfgang

    2015-01-01

    Here we walk through an end-to-end gene-level RNA-Seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results. PMID:26674615

  13. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia-nigra, compared to controls. This novel workflow allows deep multi-level inspection of RNA-Seq datasets and provides a comprehensive new resource for understanding disease transcriptome modifications in PD and other neurodegenerative diseases. PMID:24651478

  14. Identification of Reference Genes for RT-qPCR Data Normalization in Cannabis sativa Stem Tissues.

    PubMed

    Mangeot-Peter, Lauralie; Legay, Sylvain; Hausman, Jean-Francois; Esposito, Sergio; Guerriero, Gea

    2016-09-15

    Gene expression profiling via quantitative real-time PCR is a robust technique widely used in the life sciences to compare gene expression patterns in, e.g., different tissues, growth conditions, or after specific treatments. In the field of plant science, real-time PCR is the gold standard to study the dynamics of gene expression and is used to validate the results generated with high throughput techniques, e.g., RNA-Seq. An accurate relative quantification of gene expression relies on the identification of appropriate reference genes, that need to be determined for each experimental set-up used and plant tissue studied. Here, we identify suitable reference genes for expression profiling in stems of textile hemp (Cannabis sativa L.), whose tissues (isolated bast fibres and core) are characterized by remarkable differences in cell wall composition. We additionally validate the reference genes by analysing the expression of putative candidates involved in the non-oxidative phase of the pentose phosphate pathway and in the first step of the shikimate pathway. The goal is to describe the possible regulation pattern of some genes involved in the provision of the precursors needed for lignin biosynthesis in the different hemp stem tissues. The results here shown are useful to design future studies focused on gene expression analyses in hemp.

  15. Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

    PubMed

    Chakraborty, Sutirtha

    2018-05-26

    RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques. Copyright © 2017. Published by Elsevier Inc.

  16. Differential expression of Meis2, Mab21l2 and Tbx3 during limb development associated with diversification of limb morphology in mammals.

    PubMed

    Dai, Mengyao; Wang, Yao; Fang, Lu; Irwin, David M; Zhu, Tengteng; Zhang, Junpeng; Zhang, Shuyi; Wang, Zhe

    2014-01-01

    Bats are the only mammals capable of self-powered flight using wings. Differing from mouse or human limbs, four elongated digits within a broad wing membrane support the bat wing, and the foot of the bat has evolved a long calcar that spread the interfemoral membrane. Our recent mRNA sequencing (mRNA-Seq) study found unique expression patterns for genes at the 5' end of the Hoxd gene cluster and for Tbx3 that are associated with digit elongation and wing membrane growth in bats. In this study, we focused on two additional genes, Meis2 and Mab21l2, identified from the mRNA-Seq data. Using whole-mount in situ hybridization (WISH) we validated the mRNA-Seq results for differences in the expression patterns of Meis2 and Mab21l2 between bat and mouse limbs, and further characterize the timing and location of the expression of these two genes. These analyses suggest that Meis2 may function in wing membrane growth and Mab21l2 may have a role in AP and DV axial patterning. In addition, we found that Tbx3 is uniquely expressed in the unique calcar structure found in the bat hindlimb, suggesting a role for this gene in calcar growth and elongation. Moreover, analysis of the coding sequences for Meis2, Mab21l2 and Tbx3 showed that Meis2 and Mab21l2 have high sequence identity, consistent with the functions of genes being conserved, but that Tbx3 showed accelerated evolution in bats. However, evidence for positive selection in Tbx3 was not found, which would suggest that the function of this gene has not been changed. Together, our findings support the hypothesis that the modulation of the spatiotemporal expression patterns of multiple functional conserved genes control limb morphology and drive morphological change in the diversification of mammalian limbs.

  17. Differential Expression of Meis2, Mab21l2 and Tbx3 during Limb Development Associated with Diversification of Limb Morphology in Mammals

    PubMed Central

    Fang, Lu; Irwin, David M.; Zhu, Tengteng; Zhang, Junpeng; Zhang, Shuyi; Wang, Zhe

    2014-01-01

    Bats are the only mammals capable of self-powered flight using wings. Differing from mouse or human limbs, four elongated digits within a broad wing membrane support the bat wing, and the foot of the bat has evolved a long calcar that spread the interfemoral membrane. Our recent mRNA sequencing (mRNA-Seq) study found unique expression patterns for genes at the 5′ end of the Hoxd gene cluster and for Tbx3 that are associated with digit elongation and wing membrane growth in bats. In this study, we focused on two additional genes, Meis2 and Mab21l2, identified from the mRNA-Seq data. Using whole-mount in situ hybridization (WISH) we validated the mRNA-Seq results for differences in the expression patterns of Meis2 and Mab21l2 between bat and mouse limbs, and further characterize the timing and location of the expression of these two genes. These analyses suggest that Meis2 may function in wing membrane growth and Mab21l2 may have a role in AP and DV axial patterning. In addition, we found that Tbx3 is uniquely expressed in the unique calcar structure found in the bat hindlimb, suggesting a role for this gene in calcar growth and elongation. Moreover, analysis of the coding sequences for Meis2, Mab21l2 and Tbx3 showed that Meis2 and Mab21l2 have high sequence identity, consistent with the functions of genes being conserved, but that Tbx3 showed accelerated evolution in bats. However, evidence for positive selection in Tbx3 was not found, which would suggest that the function of this gene has not been changed. Together, our findings support the hypothesis that the modulation of the spatiotemporal expression patterns of multiple functional conserved genes control limb morphology and drive morphological change in the diversification of mammalian limbs. PMID:25166052

  18. Circular RNA profile in gliomas revealed by identification tool UROBORUS.

    PubMed

    Song, Xiaofeng; Zhang, Naibo; Han, Ping; Moon, Byoung-San; Lai, Rose K; Wang, Kai; Lu, Wange

    2016-05-19

    Recent evidence suggests that many endogenous circular RNAs (circRNAs) may play roles in biological processes. However, the expression patterns and functions of circRNAs in human diseases are not well understood. Computationally identifying circRNAs from total RNA-seq data is a primary step in studying their expression pattern and biological roles. In this work, we have developed a computational pipeline named UROBORUS to detect circRNAs in total RNA-seq data. By applying UROBORUS to RNA-seq data from 46 gliomas and normal brain samples, we detected thousands of circRNAs supported by at least two read counts, followed by successful experimental validation on 24 circRNAs from the randomly selected 27 circRNAs. UROBORUS is an efficient tool that can detect circRNAs with low expression levels in total RNA-seq without RNase R treatment. The circRNAs expression profiling revealed more than 476 circular RNAs differentially expressed in control brain tissues and gliomas. Together with parental gene expression, we found that circRNA and its parental gene have diversified expression patterns in gliomas and control brain tissues. This study establishes an efficient and sensitive approach for predicting circRNAs using total RNA-seq data. The UROBORUS pipeline can be accessed freely for non-commercial purposes at http://uroborus.openbioinformatics.org/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

    PubMed Central

    2017-01-01

    Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that errors in the UMI sequence are common and introduce network-based methods to account for these errors when identifying PCR duplicates. Using these methods, we demonstrate improved quantification accuracy both under simulated conditions and real iCLIP and single-cell RNA-seq data sets. Reproducibility between iCLIP replicates and single-cell RNA-seq clustering are both improved using our proposed network-based method, demonstrating the value of properly accounting for errors in UMIs. These methods are implemented in the open source UMI-tools software package. PMID:28100584

  20. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology

    PubMed Central

    Mykles, Donald L.; Burnett, Karen G.; Durica, David S.; Joyce, Blake L.; McCarthy, Fiona M.; Schmidt, Carl J.; Stillman, Jonathon H.

    2016-01-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the “Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology” symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. PMID:27639274

  1. A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments

    PubMed Central

    2013-01-01

    Background High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user. Results Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or Pólya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that tweeDEseq yields P-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility. Conclusions RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.org. PMID:23965047

  2. A multi-Poisson dynamic mixture model to cluster developmental patterns of gene expression by RNA-seq.

    PubMed

    Ye, Meixia; Wang, Zhong; Wang, Yaqun; Wu, Rongling

    2015-03-01

    Dynamic changes of gene expression reflect an intrinsic mechanism of how an organism responds to developmental and environmental signals. With the increasing availability of expression data across a time-space scale by RNA-seq, the classification of genes as per their biological function using RNA-seq data has become one of the most significant challenges in contemporary biology. Here we develop a clustering mixture model to discover distinct groups of genes expressed during a period of organ development. By integrating the density function of multivariate Poisson distribution, the model accommodates the discrete property of read counts characteristic of RNA-seq data. The temporal dependence of gene expression is modeled by the first-order autoregressive process. The model is implemented with the Expectation-Maximization algorithm and model selection to determine the optimal number of gene clusters and obtain the estimates of Poisson parameters that describe the pattern of time-dependent expression of genes from each cluster. The model has been demonstrated by analyzing a real data from an experiment aimed to link the pattern of gene expression to catkin development in white poplar. The usefulness of the model has been validated through computer simulation. The model provides a valuable tool for clustering RNA-seq data, facilitating our global view of expression dynamics and understanding of gene regulation mechanisms. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  3. Statistical inference for time course RNA-Seq data using a negative binomial mixed-effect model.

    PubMed

    Sun, Xiaoxiao; Dalpiaz, David; Wu, Di; S Liu, Jun; Zhong, Wenxuan; Ma, Ping

    2016-08-26

    Accurate identification of differentially expressed (DE) genes in time course RNA-Seq data is crucial for understanding the dynamics of transcriptional regulatory network. However, most of the available methods treat gene expressions at different time points as replicates and test the significance of the mean expression difference between treatments or conditions irrespective of time. They thus fail to identify many DE genes with different profiles across time. In this article, we propose a negative binomial mixed-effect model (NBMM) to identify DE genes in time course RNA-Seq data. In the NBMM, mean gene expression is characterized by a fixed effect, and time dependency is described by random effects. The NBMM is very flexible and can be fitted to both unreplicated and replicated time course RNA-Seq data via a penalized likelihood method. By comparing gene expression profiles over time, we further classify the DE genes into two subtypes to enhance the understanding of expression dynamics. A significance test for detecting DE genes is derived using a Kullback-Leibler distance ratio. Additionally, a significance test for gene sets is developed using a gene set score. Simulation analysis shows that the NBMM outperforms currently available methods for detecting DE genes and gene sets. Moreover, our real data analysis of fruit fly developmental time course RNA-Seq data demonstrates the NBMM identifies biologically relevant genes which are well justified by gene ontology analysis. The proposed method is powerful and efficient to detect biologically relevant DE genes and gene sets in time course RNA-Seq data.

  4. Age-Related Gene Expression Differences in Monocytes from Human Neonates, Young Adults, and Older Adults

    PubMed Central

    Tong, Ann-Jay; Kollmann, Tobias R.; Smale, Stephen T.

    2015-01-01

    A variety of age-related differences in the innate and adaptive immune systems have been proposed to contribute to the increased susceptibility to infection of human neonates and older adults. The emergence of RNA sequencing (RNA-seq) provides an opportunity to obtain an unbiased, comprehensive, and quantitative view of gene expression differences in defined cell types from different age groups. An examination of ex vivo human monocyte responses to lipopolysaccharide stimulation or Listeria monocytogenes infection by RNA-seq revealed extensive similarities between neonates, young adults, and older adults, with an unexpectedly small number of genes exhibiting statistically significant age-dependent differences. By examining the differentially induced genes in the context of transcription factor binding motifs and RNA-seq data sets from mutant mouse strains, a previously described deficiency in interferon response factor-3 activity could be implicated in most of the differences between newborns and young adults. Contrary to these observations, older adults exhibited elevated expression of inflammatory genes at baseline, yet the responses following stimulation correlated more closely with those observed in younger adults. Notably, major differences in the expression of constitutively expressed genes were not observed, suggesting that the age-related differences are driven by environmental influences rather than cell-autonomous differences in monocyte development. PMID:26147648

  5. SOX9 regulates multiple genes in chondrocytes, including genes encoding ECM proteins, ECM modification enzymes, receptors, and transporters.

    PubMed

    Oh, Chun-do; Lu, Yue; Liang, Shoudan; Mori-Akiyama, Yuko; Chen, Di; de Crombrugghe, Benoit; Yasuda, Hideyo

    2014-01-01

    The transcription factor SOX9 plays an essential role in determining the fate of several cell types and is a master factor in regulation of chondrocyte development. Our aim was to determine which genes in the genome of chondrocytes are either directly or indirectly controlled by SOX9. We used RNA-Seq to identify genes whose expression levels were affected by SOX9 and used SOX9 ChIP-Seq to identify those genes that harbor SOX9-interaction sites. For RNA-Seq, the RNA expression profile of primary Sox9flox/flox mouse chondrocytes infected with Ad-CMV-Cre was compared with that of the same cells infected with a control adenovirus. Analysis of RNA-Seq data indicated that, when the levels of Sox9 mRNA were decreased more than 8-fold by infection with Ad-CMV-Cre, 196 genes showed a decrease in expression of at least 4-fold. These included many cartilage extracellular matrix (ECM) genes and a number of genes for ECM modification enzymes (transferases), membrane receptors, transporters, and others. In ChIP-Seq, 75% of the SOX9-interaction sites had a canonical inverted repeat motif within 100 bp of the top of the peak. SOX9-interaction sites were found in 55% of the genes whose expression was decreased more than 8-fold in SOX9-depleted cells and in somewhat fewer of the genes whose expression was reduced more than 4-fold, suggesting that these are direct targets of SOX9. The combination of RNA-Seq and ChIP-Seq has provided a fuller understanding of the SOX9-controlled genetic program of chondrocytes.

  6. Spatio-temporal dynamics in global rice gene expression (Oryza sativa L.) in response to high ammonium stress.

    PubMed

    Sun, Li; Di, Dongwei; Li, Guangjie; Kronzucker, Herbert J; Shi, Weiming

    2017-05-01

    Ammonium (NH 4 + ) is the predominant nitrogen (N) source in many natural and agricultural ecosystems, including flooded rice fields. While rice is known as an NH 4 + -tolerant species, it nevertheless suffers NH 4 + toxicity at elevated soil concentrations. NH 4 + excess rapidly leads to the disturbance of various physiological processes that ultimately inhibit shoot and root growth. However, the global transcriptomic response to NH 4 + stress in rice has not been examined. In this study, we mapped the spatio-temporal specificity of gene expression profiles in rice under excess NH 4 + and the changes in gene expression in root and shoot at various time points by RNA-Seq (Quantification) using Illumina HiSeqTM 2000. By comparative analysis, 307 and 675 genes were found to be up-regulated after 4h and 12h of NH 4 + exposure in the root, respectively. In the shoot, 167 genes were up-regulated at 4h, compared with 320 at 12h. According to KEGG analysis, up-regulated DEGs mainly participate in phenylpropanoid (such as flavonoid) and amino acid (such as proline, cysteine, and methionine) metabolism, which is believed to improve NH 4 + stress tolerance through adjustment of energy metabolism in the shoot, while defense and signaling pathways, guiding whole-plant acclimation, play the leading role in the root. We furthermore critically assessed the roles of key phytohormones, and found abscisic acid (ABA) and ethylene (ET) to be the major regulatory molecules responding to excess NH 4 + and activating the MAPK (mitogen-activated protein kinase) signal-transduction pathway. Moreover, we found up-regulated hormone-associated genes are involved in regulating flavonoid biosynthesis and are regulated by tissue flavonoid accumulation. Copyright © 2017 Elsevier GmbH. All rights reserved.

  7. Technical variations in low-input RNA-seq methodologies.

    PubMed

    Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar

    2014-01-14

    Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.

  8. RNA-seq Data: Challenges in and Recommendations for Experimental Design and Analysis.

    PubMed

    Williams, Alexander G; Thomas, Sean; Wyman, Stacia K; Holloway, Alisha K

    2014-10-01

    RNA-seq is widely used to determine differential expression of genes or transcripts as well as identify novel transcripts, identify allele-specific expression, and precisely measure translation of transcripts. Thoughtful experimental design and choice of analysis tools are critical to ensure high-quality data and interpretable results. Important considerations for experimental design include number of replicates, whether to collect paired-end or single-end reads, sequence length, and sequencing depth. Common analysis steps in all RNA-seq experiments include quality control, read alignment, assigning reads to genes or transcripts, and estimating gene or transcript abundance. Our aims are two-fold: to make recommendations for common components of experimental design and assess tool capabilities for each of these steps. We also test tools designed to detect differential expression, since this is the most widespread application of RNA-seq. We hope that these analyses will help guide those who are new to RNA-seq and will generate discussion about remaining needs for tool improvement and development. Copyright © 2014 John Wiley & Sons, Inc.

  9. DTWscore: differential expression and cell clustering analysis for time-series single-cell RNA-seq data.

    PubMed

    Wang, Zhuo; Jin, Shuilin; Liu, Guiyou; Zhang, Xiurui; Wang, Nan; Wu, Deliang; Hu, Yang; Zhang, Chiping; Jiang, Qinghua; Xu, Li; Wang, Yadong

    2017-05-23

    The development of single-cell RNA sequencing has enabled profound discoveries in biology, ranging from the dissection of the composition of complex tissues to the identification of novel cell types and dynamics in some specialized cellular environments. However, the large-scale generation of single-cell RNA-seq (scRNA-seq) data collected at multiple time points remains a challenge to effective measurement gene expression patterns in transcriptome analysis. We present an algorithm based on the Dynamic Time Warping score (DTWscore) combined with time-series data, that enables the detection of gene expression changes across scRNA-seq samples and recovery of potential cell types from complex mixtures of multiple cell types. The DTWscore successfully classify cells of different types with the most highly variable genes from time-series scRNA-seq data. The study was confined to methods that are implemented and available within the R framework. Sample datasets and R packages are available at https://github.com/xiaoxiaoxier/DTWscore .

  10. Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

    PubMed

    Hoshino, Tatsuhiko; Inagaki, Fumio

    2017-01-01

    Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.

  11. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

  12. Analytical performance of the ThyroSeq v3 genomic classifier for cancer diagnosis in thyroid nodules.

    PubMed

    Nikiforova, Marina N; Mercurio, Stephanie; Wald, Abigail I; Barbi de Moura, Michelle; Callenberg, Keith; Santana-Santos, Lucas; Gooding, William E; Yip, Linwah; Ferris, Robert L; Nikiforov, Yuri E

    2018-04-15

    Molecular tests have clinical utility for thyroid nodules with indeterminate fine-needle aspiration (FNA) cytology, although their performance requires further improvement. This study evaluated the analytical performance of the newly created ThyroSeq v3 test. ThyroSeq v3 is a DNA- and RNA-based next-generation sequencing assay that analyzes 112 genes for a variety of genetic alterations, including point mutations, insertions/deletions, gene fusions, copy number alterations, and abnormal gene expression, and it uses a genomic classifier (GC) to separate malignant lesions from benign lesions. It was validated in 238 tissue samples and 175 FNA samples with known surgical follow-up. Analytical performance studies were conducted. In the training tissue set of samples, ThyroSeq GC detected more than 100 genetic alterations, including BRAF, RAS, TERT, and DICER1 mutations, NTRK1/3, BRAF, and RET fusions, 22q loss, and gene expression alterations. GC cutoffs were established to distinguish cancer from benign nodules with 93.9% sensitivity, 89.4% specificity, and 92.1% accuracy. This correctly classified most papillary, follicular, and Hurthle cell lesions, medullary thyroid carcinomas, and parathyroid lesions. In the FNA validation set, the GC sensitivity was 98.0%, the specificity was 81.8%, and the accuracy was 90.9%. Analytical accuracy studies demonstrated a minimal required nucleic acid input of 2.5 ng, a 12% minimal acceptable tumor content, and reproducible test results under variable stress conditions. The ThyroSeq v3 GC analyzes 5 different classes of molecular alterations and provides high accuracy for detecting all common types of thyroid cancer and parathyroid lesions. The analytical sensitivity, specificity, and robustness of the test have been successfully validated and indicate its suitability for clinical use. Cancer 2018;124:1682-90. © 2018 American Cancer Society. © 2018 American Cancer Society.

  13. Global analysis of epigenetic regulation of gene expression in response to drought stress in Sorghum.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reddy, Anireddy; Ben-Hur, Asa

    Abiotic stresses including drought are major limiting factors of crop yields and cause significant crop losses. Acquisition of stress tolerance to abiotic stresses requires coordinated regulation of a multitude of biochemical and physiological changes, and most of these changes depend on alterations in gene expression. The goal of this work is to perform global analysis of differential regulation of gene expression and alternative splicing, and their relationship with chromatin landscape in drought sensitive and tolerant cultivars. our Iso-Seq study revealed transcriptome-wide full-length isoforms at an unprecedented scale with over 11000 novel splice isoforms. Additionally, we uncovered alternative polyadenylation sites ofmore » ~11000 expressed genes and many novel genes. Overall, Iso-Seq results greatly enhanced sorghum gene annotations that are not only useful in analyzing all our RNA-seq, ChIP-seq and ATAC-seq data but also serve as a great resource to the plant biology community. Our studies identified differentially expressed genes and splicing events that are correlated with the drought-resistant phenotype. An association between alternative splicing and chromatin accessibility was also revealed. Several computational tools developed here (TAPIS and iDiffIR) have been made freely available to the research community in analyzing alternative splicing and differential alternative splicing.« less

  14. viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors.

    PubMed

    Bhuvaneshwar, Krithika; Song, Lei; Madhavan, Subha; Gusev, Yuriy

    2018-01-01

    An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TCGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TCGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TCGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.

  15. Regulation of the Salmonella enterica std fimbrial operon by DNA adenine methylation, SeqA, and HdfR.

    PubMed

    Jakomin, Marcello; Chessa, Daniela; Bäumler, Andreas J; Casadesús, Josep

    2008-11-01

    DNA adenine methylase (dam) mutants of Salmonella enterica serovar Typhimurium grown under laboratory conditions express the std fimbrial operon, which is tightly repressed in the wild type. Here, we show that uncontrolled production of Std fimbriae in S. enterica serovar Typhimurium dam mutants contributes to attenuation in mice, as indicated by the observation that an stdA dam strain is more competitive than a dam strain upon oral infection. Dam methylation appears to regulate std transcription, rather than std mRNA stability or turnover. A genetic screen for std regulators showed that the GATC-binding protein SeqA directly or indirectly represses std expression, while the poorly characterized yifA gene product serves as an std activator. YifA encodes a putative LysR-like protein and has been renamed HdfR, like its Escherichia coli homolog. Activation of std expression by HdfR is observed only in dam and seqA backgrounds. These data suggest that HdfR directly or indirectly activates std transcription. Since SeqA is unable to bind nonmethylated DNA, it is possible that std operon derepression in dam and seqA mutants may result from unconstrained HdfR-mediated activation of std transcription. Derepression of std in dam and seqA mutants of S. enterica occurs in only a fraction of the bacterial population, suggesting the occurrence of either bistable expression or phase variation.

  16. The Model-Based Study of the Effectiveness of Reporting Lists of Small Feature Sets Using RNA-Seq Data.

    PubMed

    Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R

    2017-01-01

    Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.

  17. Simultaneous isoform discovery and quantification from RNA-seq.

    PubMed

    Hiller, David; Wong, Wing Hung

    2013-05-01

    RNA sequencing is a recent technology which has seen an explosion of methods addressing all levels of analysis, from read mapping to transcript assembly to differential expression modeling. In particular the discovery of isoforms at the transcript assembly stage is a complex problem and current approaches suffer from various limitations. For instance, many approaches use graphs to construct a minimal set of isoforms which covers the observed reads, then perform a separate algorithm to quantify the isoforms, which can result in a loss of power. Current methods also use ad-hoc solutions to deal with the vast number of possible isoforms which can be constructed from a given set of reads. Finally, while the need of taking into account features such as read pairing and sampling rate of reads has been acknowledged, most existing methods do not seamlessly integrate these features as part of the model. We present Montebello, an integrated statistical approach which performs simultaneous isoform discovery and quantification by using a Monte Carlo simulation to find the most likely isoform composition leading to a set of observed reads. We compare Montebello to Cufflinks, a popular isoform discovery approach, on a simulated data set and on 46.3 million brain reads from an Illumina tissue panel. On this data set Montebello appears to offer a modest improvement over Cufflinks when considering discovery and parsimony metrics. In addition Montebello mitigates specific difficulties inherent in the Cufflinks approach. Finally, Montebello can be fine-tuned depending on the type of solution desired.

  18. Investigation of Experimental Factors That Underlie BRCA1/2 mRNA Isoform Expression Variation: Recommendations for Utilizing Targeted RNA Sequencing to Evaluate Potential Spliceogenic Variants

    PubMed Central

    Lattimore, Vanessa L.; Pearson, John F.; Currie, Margaret J.; Spurdle, Amanda B.; Robinson, Bridget A.; Walker, Logan C.

    2018-01-01

    PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2. The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates (n > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance. PMID:29774201

  19. RNA-Seq of Arabidopsis Pollen Uncovers Novel Transcription and Alternative Splicing1[C][W][OA

    PubMed Central

    Loraine, Ann E.; McCormick, Sheila; Estrada, April; Patel, Ketan; Qin, Peng

    2013-01-01

    Pollen grains of Arabidopsis (Arabidopsis thaliana) contain two haploid sperm cells enclosed in a haploid vegetative cell. Upon germination, the vegetative cell extrudes a pollen tube that carries the sperm to an ovule for fertilization. Knowing the identity, relative abundance, and splicing patterns of pollen transcripts will improve our understanding of pollen and allow investigation of tissue-specific splicing in plants. Most Arabidopsis pollen transcriptome studies have used the ATH1 microarray, which does not assay splice variants and lacks specific probe sets for many genes. To investigate the pollen transcriptome, we performed high-throughput sequencing (RNA-Seq) of Arabidopsis pollen and seedlings for comparison. Gene expression was more diverse in seedling, and genes involved in cell wall biogenesis were highly expressed in pollen. RNA-Seq detected at least 4,172 protein-coding genes expressed in pollen, including 289 assayed only by nonspecific probe sets. Additional exons and previously unannotated 5′ and 3′ untranslated regions for pollen-expressed genes were revealed. We detected regions in the genome not previously annotated as expressed; 14 were tested and 12 were confirmed by polymerase chain reaction. Gapped read alignments revealed 1,908 high-confidence new splicing events supported by 10 or more spliced read alignments. Alternative splicing patterns in pollen and seedling were highly correlated. For most alternatively spliced genes, the ratio of variants in pollen and seedling was similar, except for some encoding proteins involved in RNA splicing. This study highlights the robustness of splicing patterns in plants and the importance of ongoing annotation and visualization of RNA-Seq data using interactive tools such as Integrated Genome Browser. PMID:23590974

  20. Investigation of Experimental Factors That Underlie BRCA1/2 mRNA Isoform Expression Variation: Recommendations for Utilizing Targeted RNA Sequencing to Evaluate Potential Spliceogenic Variants.

    PubMed

    Lattimore, Vanessa L; Pearson, John F; Currie, Margaret J; Spurdle, Amanda B; Robinson, Bridget A; Walker, Logan C

    2018-01-01

    PCR-based RNA splicing assays are commonly used in diagnostic and research settings to assess the potential effects of variants of uncertain clinical significance in BRCA1 and BRCA2 . The Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium completed a multicentre investigation to evaluate differences in assay design and the integrity of published data, raising a number of methodological questions associated with cell culture conditions and PCR-based protocols. We utilized targeted RNA-seq to re-assess BRCA1 and BRCA2 mRNA isoform expression patterns in lymphoblastoid cell lines (LCLs) previously used in the multicentre ENIGMA study. Capture of the targeted cDNA sequences was carried out using 34 BRCA1 and 28 BRCA2 oligonucleotides from the Illumina Truseq Targeted RNA Expression platform. Our results show that targeted RNA-seq analysis of LCLs overcomes many of the methodology limitations associated with PCR-based assays leading us to make the following observations and recommendations: (1) technical replicates ( n  > 2) of variant carriers to capture methodology induced variability associated with RNA-seq assays, (2) LCLs can undergo multiple freeze/thaw cycles and can be cultured up to 2 weeks without noticeably influencing isoform expression levels, (3) nonsense-mediated decay inhibitors are essential prior to splicing assays for comprehensive mRNA isoform detection, (4) quantitative assessment of exon:exon junction levels across BRCA1 and BRCA2 can help distinguish between normal and aberrant isoform expression patterns. Experimentally derived recommendations from this study will facilitate the application of targeted RNA-seq platforms for the quantitation of BRCA1 and BRCA2 mRNA aberrations associated with sequence variants of uncertain clinical significance.

  1. A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells.

    PubMed

    Wolff, Alexander; Bayerlová, Michaela; Gaedcke, Jochen; Kube, Dieter; Beißbarth, Tim

    2018-01-01

    Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat's overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67-0.69) than for the cell line dataset (ρ = 0.87-0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21-0.29 and 0.34-0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.

  2. ChloroSeq, an optimized chloroplast RNA-Seq bioinformatic pipeline, reveals remodeling of the organellar transcriptome under heat stress

    DOE PAGES

    Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.; ...

    2016-07-06

    Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less

  3. ChloroSeq, an optimized chloroplast RNA-Seq bioinformatic pipeline, reveals remodeling of the organellar transcriptome under heat stress

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castandet, Benoît; Hotto, Amber M.; Strickler, Susan R.

    Although RNA-Seq has revolutionized transcript analysis, organellar transcriptomes are rarely assessed even when present in published datasets. Here, we describe the development and application of a rapid and convenient method, ChloroSeq, to delineate qualitative and quantitative features of chloroplast RNA metabolism from strand-specific RNA-Seq datasets, including processing, editing, splicing, and relative transcript abundance. The use of a single experiment to analyze systematically chloroplast transcript maturation and abundance is of particular interest due to frequent pleiotropic effects observed in mutants that affect chloroplast gene expression and/or photosynthesis. To illustrate its utility, ChloroSeq was applied to published RNA-Seq datasets derived from Arabidopsismore » thaliana grown under control and abiotic stress conditions, where the organellar transcriptome had not been examined. The most appreciable effects were found for heat stress, which induces a global reduction in splicing and editing efficiency, and leads to increased abundance of chloroplast transcripts, including genic, intergenic, and antisense transcripts. Moreover, by concomitantly analyzing nuclear transcripts that encode chloroplast gene expression regulators from the same libraries, we demonstrate the possibility of achieving a holistic understanding of the nucleus-organelle system. In conclusion, ChloroSeq thus represents a unique method for streamlining RNA-Seq data interpretation of the chloroplast transcriptome and its regulators.« less

  4. Single-Cell mRNA-Seq Using the Fluidigm C1 System and Integrated Fluidics Circuits.

    PubMed

    Gong, Haibiao; Do, Devin; Ramakrishnan, Ramesh

    2018-01-01

    Single-cell mRNA-seq is a valuable tool to dissect expression profiles and to understand the regulatory network of genes. Microfluidics is well suited for single-cell analysis owing both to the small volume of the reaction chambers and easiness of automation. Here we describe the workflow of single-cell mRNA-seq using C1 IFC, which can isolate and process up to 96 cells. Both on-chip procedure (lysis, reverse transcription, and preamplification PCR) and off-chip sequencing library preparation protocols are described. The workflow generates full-length mRNA information, which is more valuable compared to 3' end counting method for many applications.

  5. Tubulin C-terminal Post-translational Modifications Do Not Occur in Wood Forming Tissue of Populus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, Hao; Gu, Xi; Xue, Liang-Jiao

    Cortical microtubules (MTs) are evolutionarily conserved cytoskeletal components with specialized roles in plants, including regulation of cell wall biogenesis. MT functions and dynamics are dictated by the composition of their monomeric subunits, α- (TUA) and β-tubulins (TUB), which in animals and protists are subject to both transcriptional regulation and post-translational modifications (PTM). While spatiotemporal regulation of tubulin gene expression has been reported in plants, whether and to what extent tubulin PTMs occur in these species remain poorly understood. We chose the woody perennial Populus for investigation of tubulin PTMs in this study, with a particular focus on developing xylem wheremore » high tubulin transcript levels support MT-dependent secondary cell wall deposition. Mass spectrometry and immunodetection concurred that detyrosination, non-tyrosination and glutamylation were essentially absent in tubulins isolated from wood-forming tissues of P. deltoides and P. tremula ×alba. Label-free quantification of tubulin isotypes and RNA-Seq estimation of tubulin transcript abundance were largely consistent with transcriptional regulation. However, two TUB isotypes were detected at noticeably lower levels than expected based on RNA-Seq transcript abundance in both Populus species. These findings led us to conclude that MT composition during wood formation depends exclusively on transcriptional and, to a lesser extent, translational regulation of tubulin isotypes.« less

  6. Tubulin C-terminal Post-translational Modifications Do Not Occur in Wood Forming Tissue of Populus

    DOE PAGES

    Hu, Hao; Gu, Xi; Xue, Liang-Jiao; ...

    2016-10-13

    Cortical microtubules (MTs) are evolutionarily conserved cytoskeletal components with specialized roles in plants, including regulation of cell wall biogenesis. MT functions and dynamics are dictated by the composition of their monomeric subunits, α- (TUA) and β-tubulins (TUB), which in animals and protists are subject to both transcriptional regulation and post-translational modifications (PTM). While spatiotemporal regulation of tubulin gene expression has been reported in plants, whether and to what extent tubulin PTMs occur in these species remain poorly understood. We chose the woody perennial Populus for investigation of tubulin PTMs in this study, with a particular focus on developing xylem wheremore » high tubulin transcript levels support MT-dependent secondary cell wall deposition. Mass spectrometry and immunodetection concurred that detyrosination, non-tyrosination and glutamylation were essentially absent in tubulins isolated from wood-forming tissues of P. deltoides and P. tremula ×alba. Label-free quantification of tubulin isotypes and RNA-Seq estimation of tubulin transcript abundance were largely consistent with transcriptional regulation. However, two TUB isotypes were detected at noticeably lower levels than expected based on RNA-Seq transcript abundance in both Populus species. These findings led us to conclude that MT composition during wood formation depends exclusively on transcriptional and, to a lesser extent, translational regulation of tubulin isotypes.« less

  7. Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists.

    PubMed

    Zhu, Xun; Wolfgruber, Thomas K; Tasato, Austin; Arisdakessian, Cédric; Garmire, David G; Garmire, Lana X

    2017-12-05

    Single-cell RNA sequencing (scRNA-Seq) is an increasingly popular platform to study heterogeneity at the single-cell level. Computational methods to process scRNA-Seq data are not very accessible to bench scientists as they require a significant amount of bioinformatic skills. We have developed Granatum, a web-based scRNA-Seq analysis pipeline to make analysis more broadly accessible to researchers. Without a single line of programming code, users can click through the pipeline, setting parameters and visualizing results via the interactive graphical interface. Granatum conveniently walks users through various steps of scRNA-Seq analysis. It has a comprehensive list of modules, including plate merging and batch-effect removal, outlier-sample removal, gene-expression normalization, imputation, gene filtering, cell clustering, differential gene expression analysis, pathway/ontology enrichment analysis, protein network interaction visualization, and pseudo-time cell series construction. Granatum enables broad adoption of scRNA-Seq technology by empowering bench scientists with an easy-to-use graphical interface for scRNA-Seq data analysis. The package is freely available for research use at http://garmiregroup.org/granatum/app.

  8. FUNDAMENTALS OF VITAMIN D HORMONE-REGULATED GENE EXPRESSION

    PubMed Central

    Pike, J. Wesley; Meyer, Mark B.

    2014-01-01

    Initial research focused upon several known genetic targets provided early insight into the mechanism of action of the vitamin D hormone (1,25-dihydroxyvitamin D3 (1,25(OH)2D3)). Recently, however, a series of technical advances involving the coupling of chromatin immunoprecipitation (ChIP) to unbiased methodologies that initially involved tiled DNA microarrays (ChIP-chip analysis) and now Next Generation DNA Sequencing techniques (ChIP-Seq analysis) has opened new avenues of research into the mechanisms through which 1,25(OH)2D3 regulates gene expression. In this review, we summarize briefly the results of this early work and then focus on more recent studies in which ChIP-chip and ChIP-seq analyses have been used to explore the mechanisms of 1,25(OH)2D3 action on a genome-wide scale providing specific target genes as examples. The results of this work have advanced our understanding of the mechanisms involved at both genetic and epigenetic levels and have revealed a series of new principles through which the vitamin D hormone functions to control the expression of genes. PMID:24239506

  9. Epigenomic Landscape of Human Fetal Brain, Heart, and Liver.

    PubMed

    Yan, Liying; Guo, Hongshan; Hu, Boqiang; Li, Rong; Yong, Jun; Zhao, Yangyu; Zhi, Xu; Fan, Xiaoying; Guo, Fan; Wang, Xiaoye; Wang, Wei; Wei, Yuan; Wang, Yan; Wen, Lu; Qiao, Jie; Tang, Fuchou

    2016-02-26

    The epigenetic regulation of spatiotemporal gene expression is crucial for human development. Here, we present whole-genome chromatin immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) analyses of a wide variety of histone markers in the brain, heart, and liver of early human embryos shortly after their formation. We identified 40,181 active enhancers, with a large portion showing tissue-specific and developmental stage-specific patterns, pointing to their roles in controlling the ordered spatiotemporal expression of the developmental genes in early human embryos. Moreover, using sequential ChIP-seq, we showed that all three organs have hundreds to thousands of bivalent domains that are marked by both H3K4me3 and H3K27me3, probably to keep the progenitor cells in these organs ready for immediate differentiation into diverse cell types during subsequent developmental processes. Our work illustrates the potentially critical roles of tissue-specific and developmental stage-specific epigenomes in regulating the spatiotemporal expression of developmental genes during early human embryonic development. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  10. Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

    PubMed Central

    Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv

    2010-01-01

    RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462

  11. Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

    PubMed

    Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

    2014-08-16

    Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.

  12. ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

    PubMed Central

    McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

    2013-01-01

    Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943

  13. An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.

    PubMed

    Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit

    2016-05-26

    Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.

  14. Transcriptional profiling of murine osteoblast differentiation based on RNA-seq expression analyses.

    PubMed

    Khayal, Layal Abo; Grünhagen, Johannes; Provazník, Ivo; Mundlos, Stefan; Kornak, Uwe; Robinson, Peter N; Ott, Claus-Eric

    2018-04-11

    Osteoblastic differentiation is a multistep process characterized by osteogenic induction of mesenchymal stem cells, which then differentiate into proliferative pre-osteoblasts that produce copious amounts of extracellular matrix, followed by stiffening of the extracellular matrix, and matrix mineralization by hydroxylapatite deposition. Although these processes have been well characterized biologically, a detailed transcriptional analysis of murine primary calvaria osteoblast differentiation based on RNA sequencing (RNA-seq) analyses has not previously been reported. Here, we used RNA-seq to obtain expression values of 29,148 genes at four time points as murine primary calvaria osteoblasts differentiate in vitro until onset of mineralization was clearly detectable by microscopic inspection. Expression of marker genes confirmed osteogenic differentiation. We explored differential expression of 1386 protein-coding genes using unsupervised clustering and GO analyses. 100 differentially expressed lncRNAs were investigated by co-expression with protein-coding genes that are localized within the same topologically associated domain. Additionally, we monitored expression of 237 genes that are silent or active at distinct time points and compared differential exon usage. Our data represent an in-depth profiling of murine primary calvaria osteoblast differentiation by RNA-seq and contribute to our understanding of genetic regulation of this key process in osteoblast biology. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  16. Selection of reference genes for transcriptional analysis of edible tubers of potato (Solanum tuberosum L.).

    PubMed

    Mariot, Roberta Fogliatto; de Oliveira, Luisa Abruzzi; Voorhuijzen, Marleen M; Staats, Martijn; Hutten, Ronald C B; Van Dijk, Jeroen P; Kok, Esther; Frazzon, Jeverson

    2015-01-01

    Potato (Solanum tuberosum) yield has increased dramatically over the last 50 years and this has been achieved by a combination of improved agronomy and biotechnology efforts. Gene studies are taking place to improve new qualities and develop new cultivars. Reverse transcriptase quantitative polymerase chain reaction (RT-qPCR) is a bench-marking analytical tool for gene expression analysis, but its accuracy is highly dependent on a reliable normalization strategy of an invariant reference genes. For this reason, the goal of this work was to select and validate reference genes for transcriptional analysis of edible tubers of potato. To do so, RT-qPCR primers were designed for ten genes with relatively stable expression in potato tubers as observed in RNA-Seq experiments. Primers were designed across exon boundaries to avoid genomic DNA contamination. Differences were observed in the ranking of candidate genes identified by geNorm, NormFinder and BestKeeper algorithms. The ranks determined by geNorm and NormFinder were very similar and for all samples the most stable candidates were C2, exocyst complex component sec3 (SEC3) and ATCUL3/ATCUL3A/CUL3/CUL3A (CUL3A). According to BestKeeper, the importin alpha and ubiquitin-associated/ts-n genes were the most stable. Three genes were selected as reference genes for potato edible tubers in RT-qPCR studies. The first one, called C2, was selected in common by NormFinder and geNorm, the second one is SEC3, selected by NormFinder, and the third one is CUL3A, selected by geNorm. Appropriate reference genes identified in this work will help to improve the accuracy of gene expression quantification analyses by taking into account differences that may be observed in RNA quality or reverse transcription efficiency across the samples.

  17. A RNA-Seq Analysis of the Rat Supraoptic Nucleus Transcriptome: Effects of Salt Loading on Gene Expression

    PubMed Central

    Salinas, Yasmmyn D.; Shi, YiJun; Greenwood, Michael; Hoe, See Ziau; Murphy, David; Gainer, Harold

    2015-01-01

    Magnocellular neurons (MCNs) in the hypothalamo-neurohypophysial system (HNS) are highly specialized to release large amounts of arginine vasopressin (Avp) or oxytocin (Oxt) into the blood stream and play critical roles in the regulation of body fluid homeostasis. The MCNs are osmosensory neurons and are excited by exposure to hypertonic solutions and inhibited by hypotonic solutions. The MCNs respond to systemic hypertonic and hypotonic stimulation with large changes in the expression of their Avp and Oxt genes, and microarray studies have shown that these osmotic perturbations also cause large changes in global gene expression in the HNS. In this paper, we examine gene expression in the rat supraoptic nucleus (SON) under normosmotic and chronic salt-loading SL) conditions by the first time using “new-generation”, RNA sequencing (RNA-Seq) methods. We reliably detect 9,709 genes as present in the SON by RNA-Seq, and 552 of these genes were changed in expression as a result of chronic SL. These genes reflect diverse functions, and 42 of these are involved in either transcriptional or translational processes. In addition, we compare the SON transcriptomes resolved by RNA-Seq methods with the SON transcriptomes determined by Affymetrix microarray methods in rats under the same osmotic conditions, and find that there are 6,466 genes present in the SON that are represented in both data sets, although 1,040 of the expressed genes were found only in the microarray data, and 2,762 of the expressed genes are selectively found in the RNA-Seq data and not the microarray data. These data provide the research community a comprehensive view of the transcriptome in the SON under normosmotic conditions and the changes in specific gene expression evoked by salt loading. PMID:25897513

  18. Combining laser microdissection and RNA-seq to chart the transcriptional landscape of fungal development

    PubMed Central

    2012-01-01

    Background During sexual development, filamentous ascomycetes form complex, three-dimensional fruiting bodies for the protection and dispersal of sexual spores. Fruiting bodies contain a number of cell types not found in vegetative mycelium, and these morphological differences are thought to be mediated by changes in gene expression. However, little is known about the spatial distribution of gene expression in fungal development. Here, we used laser microdissection (LM) and RNA-seq to determine gene expression patterns in young fruiting bodies (protoperithecia) and non-reproductive mycelia of the ascomycete Sordaria macrospora. Results Quantitative analysis showed major differences in the gene expression patterns between protoperithecia and total mycelium. Among the genes strongly up-regulated in protoperithecia were the pheromone precursor genes ppg1 and ppg2. The up-regulation was confirmed by fluorescence microscopy of egfp expression under the control of ppg1 regulatory sequences. RNA-seq analysis of protoperithecia from the sterile mutant pro1 showed that many genes that are differentially regulated in these structures are under the genetic control of transcription factor PRO1. Conclusions We have generated transcriptional profiles of young fungal sexual structures using a combination of LM and RNA-seq. This allowed a high spatial resolution and sensitivity, and yielded a detailed picture of gene expression during development. Our data revealed significant differences in gene expression between protoperithecia and non-reproductive mycelia, and showed that the transcription factor PRO1 is involved in the regulation of many genes expressed specifically in sexual structures. The LM/RNA-seq approach will also be relevant to other eukaryotic systems in which multicellular development is investigated. PMID:23016559

  19. Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

    PubMed

    Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

    2017-10-06

    Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.

  20. Distinct polyadenylation landscapes of diverse human tissues revealed by a modified PA-seq strategy

    PubMed Central

    2013-01-01

    Background Polyadenylation is a key regulatory step in eukaryotic gene expression and one of the major contributors of transcriptome diversity. Aberrant polyadenylation often associates with expression defects and leads to human diseases. Results To better understand global polyadenylation regulation, we have developed a polyadenylation sequencing (PA-seq) approach. By profiling polyadenylation events in 13 human tissues, we found that alternative cleavage and polyadenylation (APA) is prevalent in both protein-coding and noncoding genes. In addition, APA usage, similar to gene expression profiling, exhibits tissue-specific signatures and is sufficient for determining tissue origin. A 3′ untranslated region shortening index (USI) was further developed for genes with tandem APA sites. Strikingly, the results showed that different tissues exhibit distinct patterns of shortening and/or lengthening of 3′ untranslated regions, suggesting the intimate involvement of APA in establishing tissue or cell identity. Conclusions This study provides a comprehensive resource to uncover regulated polyadenylation events in human tissues and to characterize the underlying regulatory mechanism. PMID:24025092

  1. ORMAN: optimal resolution of ambiguous RNA-Seq multimappings in the presence of novel isoforms.

    PubMed

    Dao, Phuong; Numanagić, Ibrahim; Lin, Yen-Yi; Hach, Faraz; Karakoc, Emre; Donmez, Nilgun; Collins, Colin; Eichler, Evan E; Sahinalp, S Cenk

    2014-03-01

    RNA-Seq technology is promising to uncover many novel alternative splicing events, gene fusions and other variations in RNA transcripts. For an accurate detection and quantification of transcripts, it is important to resolve the mapping ambiguity for those RNA-Seq reads that can be mapped to multiple loci: >17% of the reads from mouse RNA-Seq data and 50% of the reads from some plant RNA-Seq data have multiple mapping loci. In this study, we show how to resolve the mapping ambiguity in the presence of novel transcriptomic events such as exon skipping and novel indels towards accurate downstream analysis. We introduce ORMAN ( O ptimal R esolution of M ultimapping A mbiguity of R N A-Seq Reads), which aims to compute the minimum number of potential transcript products for each gene and to assign each multimapping read to one of these transcripts based on the estimated distribution of the region covering the read. ORMAN achieves this objective through a combinatorial optimization formulation, which is solved through well-known approximation algorithms, integer linear programs and heuristics. On a simulated RNA-Seq dataset including a random subset of transcripts from the UCSC database, the performance of several state-of-the-art methods for identifying and quantifying novel transcripts, such as Cufflinks, IsoLasso and CLIIQ, is significantly improved through the use of ORMAN. Furthermore, in an experiment using real RNA-Seq reads, we show that ORMAN is able to resolve multimapping to produce coverage values that are similar to the original distribution, even in genes with highly non-uniform coverage. ORMAN is available at http://orman.sf.net

  2. TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.

    PubMed

    Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han

    2017-03-01

    High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

  3. Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications.

    PubMed

    Van den Berge, Koen; Perraudeau, Fanny; Soneson, Charlotte; Love, Michael I; Risso, Davide; Vert, Jean-Philippe; Robinson, Mark D; Dudoit, Sandrine; Clement, Lieven

    2018-02-26

    Dropout events in single-cell RNA sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial model, that identifies excess zero counts and generates gene- and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

  4. Improved Annotation of 3′ Untranslated Regions and Complex Loci by Combination of Strand-Specific Direct RNA Sequencing, RNA-Seq and ESTs

    PubMed Central

    Song, Junfang; Duc, Céline; Storey, Kate G.; McLean, W. H. Irwin; Brown, Sara J.; Simpson, Gordon G.; Barton, Geoffrey J.

    2014-01-01

    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data. PMID:24722185

  5. Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki)

    PubMed Central

    2013-01-01

    Background The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. Results We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Conclusions Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools. PMID:24209455

  6. Design of RNA splicing analysis null models for post hoc filtering of Drosophila head RNA-Seq data with the splicing analysis kit (Spanki).

    PubMed

    Sturgill, David; Malone, John H; Sun, Xia; Smith, Harold E; Rabinow, Leonard; Samson, Marie-Laure; Oliver, Brian

    2013-11-09

    The production of multiple transcript isoforms from one gene is a major source of transcriptome complexity. RNA-Seq experiments, in which transcripts are converted to cDNA and sequenced, allow the resolution and quantification of alternative transcript isoforms. However, methods to analyze splicing are underdeveloped and errors resulting in incorrect splicing calls occur in every experiment. We used RNA-Seq data to develop sequencing and aligner error models. By applying these error models to known input from simulations, we found that errors result from false alignment to minor splice motifs and antisense stands, shifted junction positions, paralog joining, and repeat induced gaps. By using a series of quantitative and qualitative filters, we eliminated diagnosed errors in the simulation, and applied this to RNA-Seq data from Drosophila melanogaster heads. We used high-confidence junction detections to specifically interrogate local splicing differences between transcripts. This method out-performed commonly used RNA-seq methods to identify known alternative splicing events in the Drosophila sex determination pathway. We describe a flexible software package to perform these tasks called Splicing Analysis Kit (Spanki), available at http://www.cbcb.umd.edu/software/spanki. Splice-junction centric analysis of RNA-Seq data provides advantages in specificity for detection of alternative splicing. Our software provides tools to better understand error profiles in RNA-Seq data and improve inference from this new technology. The splice-junction centric approach that this software enables will provide more accurate estimates of differentially regulated splicing than current tools.

  7. Bayesian estimation of differential transcript usage from RNA-seq data.

    PubMed

    Papastamoulis, Panagiotis; Rattray, Magnus

    2017-11-27

    Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.

  8. omiRas: a Web server for differential expression analysis of miRNAs derived from small RNA-Seq data.

    PubMed

    Müller, Sören; Rycak, Lukas; Winter, Peter; Kahl, Günter; Koch, Ina; Rotter, Björn

    2013-10-15

    Small RNA deep sequencing is widely used to characterize non-coding RNAs (ncRNAs) differentially expressed between two conditions, e.g. healthy and diseased individuals and to reveal insights into molecular mechanisms underlying condition-specific phenotypic traits. The ncRNAome is composed of a multitude of RNAs, such as transfer RNA, small nucleolar RNA and microRNA (miRNA), to name few. Here we present omiRas, a Web server for the annotation, comparison and visualization of interaction networks of ncRNAs derived from next-generation sequencing experiments of two different conditions. The Web tool allows the user to submit raw sequencing data and results are presented as: (i) static annotation results including length distribution, mapping statistics, alignments and quantification tables for each library as well as lists of differentially expressed ncRNAs between conditions and (ii) an interactive network visualization of user-selected miRNAs and their target genes based on the combination of several miRNA-mRNA interaction databases. The omiRas Web server is implemented in Python, PostgreSQL, R and can be accessed at: http://tools.genxpro.net/omiras/.

  9. Resources and Recommendations for Using Transcriptomics to Address Grand Challenges in Comparative Biology.

    PubMed

    Mykles, Donald L; Burnett, Karen G; Durica, David S; Joyce, Blake L; McCarthy, Fiona M; Schmidt, Carl J; Stillman, Jonathon H

    2016-12-01

    High-throughput RNA sequencing (RNA-seq) technology has become an important tool for studying physiological responses of organisms to changes in their environment. De novo assembly of RNA-seq data has allowed researchers to create a comprehensive catalog of genes expressed in a tissue and to quantify their expression without a complete genome sequence. The contributions from the "Tapping the Power of Crustacean Transcriptomics to Address Grand Challenges in Comparative Biology" symposium in this issue show the successes and limitations of using RNA-seq in the study of crustaceans. In conjunction with the symposium, the Animal Genome to Phenome Research Coordination Network collated comments from participants at the meeting regarding the challenges encountered when using transcriptomics in their research. Input came from novices and experts ranging from graduate students to principal investigators. Many were unaware of the bioinformatics analysis resources currently available on the CyVerse platform. Our analysis of community responses led to three recommendations for advancing the field: (1) integration of genomic and RNA-seq sequence assemblies for crustacean gene annotation and comparative expression; (2) development of methodologies for the functional analysis of genes; and (3) information and training exchange among laboratories for transmission of best practices. The field lacks the methods for manipulating tissue-specific gene expression. The decapod crustacean research community should consider the cherry shrimp, Neocaridina denticulata, as a decapod model for the application of transgenic tools for functional genomics. This would require a multi-investigator effort. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  10. Whole blood transcriptional profiling comparison between different milk yield of Chinese Holstein cows using RNA-seq data.

    PubMed

    Bai, Xue; Zheng, Zhuqing; Liu, Bin; Ji, Xiaoyang; Bai, Yongsheng; Zhang, Wenguang

    2016-08-22

    The objective of this research was to investigate the variation of gene expression in the blood transcriptome profile of Chinese Holstein cows associated to the milk yield traits. We used RNA-seq to generate the bovine transcriptome from the blood of 23 lactating Chinese Holstein cows with extremely high and low milk yield. A total of 100 differentially expressed genes (DEGs) (p < 0.05, FDR < 0.05) were revealed between the high and low groups. Gene ontology (GO) analysis demonstrated that the 100 DEGs were enriched in specific biological processes with regard to defense response, immune response, inflammatory response, icosanoid metabolic process, and fatty acid metabolic process (p < 0.05). The KEGG pathway analysis with 100 DEGs revealed that the most statistically-significant metabolic pathway was related with Toll-like receptor signaling pathway (p < 0.05). The expression level of four selected DEGs was analyzed by qRT-PCR, and the results indicated that the expression patterns were consistent with the deep sequencing results by RNA-Seq. Furthermore, alternative splicing analysis of 100 DEGs demonstrated that there were different splicing pattern between high and low yielders. The alternative 3' splicing site was the major splicing pattern detected in high yielders. However, in low yielders the major type was exon skipping. This study provides a non-invasive method to identify the DEGs in cattle blood using RNA-seq for milk yield. The revealed 100 DEGs between Holstein cows with extremely high and low milk yield, and immunological pathway are likely involved in milk yield trait. Finally, this study allowed us to explore associations between immune traits and production traits related to milk production.

  11. Transcriptome-Wide Identification of Reference Genes for Expression Analysis of Soybean Responses to Drought Stress along the Day.

    PubMed

    Marcolino-Gomes, Juliana; Rodrigues, Fabiana Aparecida; Fuganti-Pagliarini, Renata; Nakayama, Thiago Jonas; Ribeiro Reis, Rafaela; Bouças Farias, Jose Renato; Harmon, Frank G; Correa Molinari, Hugo Bruno; Correa Molinari, Mayla Daiane; Nepomuceno, Alexandre

    2015-01-01

    The soybean transcriptome displays strong variation along the day in optimal growth conditions and also in response to adverse circumstances, like drought stress. However, no study conducted to date has presented suitable reference genes, with stable expression along the day, for relative gene expression quantification in combined studies on drought stress and diurnal oscillations. Recently, water deficit responses have been associated with circadian clock oscillations at the transcription level, revealing the existence of hitherto unknown processes and increasing the demand for studies on plant responses to drought stress and its oscillation during the day. We performed data mining from a transcriptome-wide background using microarrays and RNA-seq databases to select an unpublished set of candidate reference genes, specifically chosen for the normalization of gene expression in studies on soybean under both drought stress and diurnal oscillations. Experimental validation and stability analysis in soybean plants submitted to drought stress and sampled during a 24 h timecourse showed that four of these newer reference genes (FYVE, NUDIX, Golgin-84 and CYST) indeed exhibited greater expression stability than the conventionally used housekeeping genes (ELF1-β and β-actin) under these conditions. We also demonstrated the effect of using reference candidate genes with different stability values to normalize the relative expression data from a drought-inducible soybean gene (DREB5) evaluated in different periods of the day.

  12. Transcript Assembly and Quantification by RNA-Seq Reveals Differentially Expressed Genes between Soft-Endocarp and Hard-Endocarp Hawthorns

    PubMed Central

    Zhang, Feng; Liu, Zhongchi; Li, Xiaoming; Li, Wenran; Ma, Yue; Li, He; Liu, Yuexue; Zhang, Zhihong

    2013-01-01

    Hawthorn (Crataegus spp.) is an important pome with a long history as a fruit, an ornamental, and a source of medicine. Fruits of hawthorn are marked by hard stony endocarps, but a hawthorn germplasm with soft and thin endocarp was found in Liaoning province of China. To elucidate the molecular mechanism underlying the soft endocarp of hawthorn, we conducted a de novo assembly of the fruit transcriptome of Crataegus pinnatifida and compared gene expression profiles between the soft-endocarp and the hard-endocarp hawthorn varieties. De novo assembly yielded 52,673 putative unigenes, 20.4% of which are longer than 1,000 bp. Among the high-quality unique sequences, 35,979 (68.3%) had at least one significant match to an existing gene model. A total of 1,218 genes, represented 2.31% total putative unigenes, were differentially expressed between the soft-endocarp hawthorn and the hard-endocarp hawthorn. Among these differentially expressed genes, a number of lignin biosynthetic pathway genes were down-regulated while almost all the flavonoid biosynthetic pathway genes were strongly up-regulated, concomitant with the formation of soft endocarp. In addition, we have identified some MYB and NAC transcription factors that could potentially control lignin and flavonoid biosynthesis. The altered expression levels of the genes encoding lignin biosynthetic enzymes, MYB and NAC transcription factors were confirmed by quantitative RT-PCR. This is the first transcriptome analysis of Crataegus genus. The high quality ESTs generated in this study will aid future gene cloning from hawthorn. Our study provides important insights into the molecular mechanisms underlying soft endocarp formation in hawthorn. PMID:24039819

  13. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering.

    PubMed

    Lee, Soohyun; Seo, Chae Hwa; Alver, Burak Han; Lee, Sanghyuk; Park, Peter J

    2015-09-03

    RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar.

  14. Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs.

    PubMed

    Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi

    2018-02-12

    Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.

  15. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction

    PubMed Central

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K.; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G.; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H.

    2017-01-01

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. PMID:27899623

  16. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  17. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  18. Quantification of Flavin-containing Monooxygenases 1, 3, and 5 in Human Liver Microsomes by UPLC-MRM-Based Targeted Quantitative Proteomics and Its Application to the Study of Ontogeny.

    PubMed

    Chen, Yao; Zane, Nicole R; Thakker, Dhiren R; Wang, Michael Zhuo

    2016-07-01

    Flavin-containing monooxygenases (FMOs) have a significant role in the metabolism of small molecule pharmaceuticals. Among the five human FMOs, FMO1, FMO3, and FMO5 are the most relevant to hepatic drug metabolism. Although age-dependent hepatic protein expression, based on immunoquantification, has been reported previously for FMO1 and FMO3, there is very little information on hepatic FMO5 protein expression. To overcome the limitations of immunoquantification, an ultra-performance liquid chromatography (UPLC)-multiple reaction monitoring (MRM)-based targeted quantitative proteomic method was developed and optimized for the quantification of FMO1, FMO3, and FMO5 in human liver microsomes (HLM). A post-in silico product ion screening process was incorporated to verify LC-MRM detection of potential signature peptides before their synthesis. The developed method was validated by correlating marker substrate activity and protein expression in a panel of adult individual donor HLM (age 39-67 years). The mean (range) protein expression of FMO3 and FMO5 was 46 (26-65) pmol/mg HLM protein and 27 (11.5-49) pmol/mg HLM protein, respectively. To demonstrate quantification of FMO1, a panel of fetal individual donor HLM (gestational age 14-20 weeks) was analyzed. The mean (range) FMO1 protein expression was 7.0 (4.9-9.7) pmol/mg HLM protein. Furthermore, the ontogenetic protein expression of FMO5 was evaluated in fetal, pediatric, and adult HLM. The quantification of FMO proteins also was compared using two different calibration standards, recombinant proteins versus synthetic signature peptides, to assess the ratio between holoprotein versus total protein. In conclusion, a UPLC-MRM-based targeted quantitative proteomic method has been developed for the quantification of FMO enzymes in HLM. Copyright © 2016 by The American Society for Pharmacology and Experimental Therapeutics.

  19. Quantification of Flavin-containing Monooxygenases 1, 3, and 5 in Human Liver Microsomes by UPLC-MRM-Based Targeted Quantitative Proteomics and Its Application to the Study of Ontogeny

    PubMed Central

    Chen, Yao; Zane, Nicole R.; Thakker, Dhiren R.

    2016-01-01

    Flavin-containing monooxygenases (FMOs) have a significant role in the metabolism of small molecule pharmaceuticals. Among the five human FMOs, FMO1, FMO3, and FMO5 are the most relevant to hepatic drug metabolism. Although age-dependent hepatic protein expression, based on immunoquantification, has been reported previously for FMO1 and FMO3, there is very little information on hepatic FMO5 protein expression. To overcome the limitations of immunoquantification, an ultra-performance liquid chromatography (UPLC)-multiple reaction monitoring (MRM)-based targeted quantitative proteomic method was developed and optimized for the quantification of FMO1, FMO3, and FMO5 in human liver microsomes (HLM). A post-in silico product ion screening process was incorporated to verify LC-MRM detection of potential signature peptides before their synthesis. The developed method was validated by correlating marker substrate activity and protein expression in a panel of adult individual donor HLM (age 39–67 years). The mean (range) protein expression of FMO3 and FMO5 was 46 (26–65) pmol/mg HLM protein and 27 (11.5–49) pmol/mg HLM protein, respectively. To demonstrate quantification of FMO1, a panel of fetal individual donor HLM (gestational age 14–20 weeks) was analyzed. The mean (range) FMO1 protein expression was 7.0 (4.9–9.7) pmol/mg HLM protein. Furthermore, the ontogenetic protein expression of FMO5 was evaluated in fetal, pediatric, and adult HLM. The quantification of FMO proteins also was compared using two different calibration standards, recombinant proteins versus synthetic signature peptides, to assess the ratio between holoprotein versus total protein. In conclusion, a UPLC-MRM-based targeted quantitative proteomic method has been developed for the quantification of FMO enzymes in HLM. PMID:26839369

  20. CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.

    PubMed

    Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T

    2017-12-28

    In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .

  1. FlyAtlas 2: a new version of the Drosophila melanogaster expression atlas with RNA-Seq, miRNA-Seq and sex-specific data

    PubMed Central

    Krause, Sue A; Pandit, Aniruddha; Davies, Shireen A

    2018-01-01

    Abstract FlyAtlas 2 (www.flyatlas2.org) is part successor, part complement to the FlyAtlas database and web application for studying the expression of the genes of Drosophila melanogaster in different tissues of adults and larvae. Although generated in the same lab with the same fly line raised on the same diet as FlyAtlas, the FlyAtlas2 resource employs a completely new set of expression data based on RNA-Seq, rather than microarray analysis, and so it allows the user to obtain information for the expression of different transcripts of a gene. Furthermore, the data for somatic tissues are now available for both male and female adult flies, allowing studies of sexual dimorphism. Gene coverage has been extended by the inclusion of microRNAs and many of the RNA genes included in Release 6 of the Drosophila reference genome. The web interface has been modified to accommodate the extra data, but at the same time has been adapted for viewing on small mobile devices. Users also have access to the RNA-Seq reads displayed alongside the annotated Drosophila genome in the (external) UCSC browser, and are able to link out to the previous FlyAtlas resource to compare the data obtained by RNA-Seq with that obtained using microarrays. PMID:29069479

  2. Comparison of normalization methods for differential gene expression analysis in RNA-Seq experiments

    PubMed Central

    Maza, Elie; Frasse, Pierre; Senin, Pavel; Bouzayen, Mondher; Zouine, Mohamed

    2013-01-01

    In recent years, RNA-Seq technologies became a powerful tool for transcriptome studies. However, computational methods dedicated to the analysis of high-throughput sequencing data are yet to be standardized. In particular, it is known that the choice of a normalization procedure leads to a great variability in results of differential gene expression analysis. The present study compares the most widespread normalization procedures and proposes a novel one aiming at removing an inherent bias of studied transcriptomes related to their relative size. Comparisons of the normalization procedures are performed on real and simulated data sets. Real RNA-Seq data sets analyses, performed with all the different normalization methods, show that only 50% of significantly differentially expressed genes are common. This result highlights the influence of the normalization step on the differential expression analysis. Real and simulated data sets analyses give similar results showing 3 different groups of procedures having the same behavior. The group including the novel method named “Median Ratio Normalization” (MRN) gives the lower number of false discoveries. Within this group the MRN method is less sensitive to the modification of parameters related to the relative size of transcriptomes such as the number of down- and upregulated genes and the gene expression levels. The newly proposed MRN method efficiently deals with intrinsic bias resulting from relative size of studied transcriptomes. Validation with real and simulated data sets confirmed that MRN is more consistent and robust than existing methods. PMID:26442135

  3. Mapping the CgrA regulon of Rhodospirillum centenum reveals a hierarchal network controlling Gram-negative cyst development.

    PubMed

    Dong, Qian; Fang, Mingxu; Roychowdhury, Sugata; Bauer, Carl E

    2015-12-16

    Several Gram-negative species undergo development leading to the formation of metabolically dormant desiccation resistant cysts. Recent analysis of cyst development has revealed that ~20 % of the Rhodospirillum centenum transcriptome undergo temporal changes in expression as cells transition from vegetative to cyst forms. It has also been established that one trigger for cyst formation is the synthesis of the signaling nucleotide 3', 5'- cyclic guanosine monophosphate (cGMP) that is sensed by a homolog of the catabolite repressor protein called CgrA. CgrA in the presence of cGMP initiate a cascade of gene expression leading to the development of cysts. In this study, we have used RNA-seq and chromatin immunoprecipitation (ChIP-Seq) techniques to define the CgrA-cGMP regulon. Our results indicate that disruption of CgrA leads to altered expression of 258 genes, 131 of which have been previously reported to be involved in cyst development. ChIP-seq analysis combined with transcriptome data also demonstrates that CgrA directly regulates the expression of numerous sigma factors and transcription factors several of which are known to be involved in cyst cell development. This analysis reveals the presence of CgrA binding sites upstream of many developmentally regulated genes including many transcription factors and signal transduction components. CgrA thus functions as master controller of the cyst development by initiating a hierarchal cascade of downstream transcription factors that induces temporal expression of encystment genes.

  4. Cross-platform normalization of microarray and RNA-seq data for machine learning applications

    PubMed Central

    Thompson, Jeffrey A.; Tan, Jie

    2016-01-01

    Large, publicly available gene expression datasets are often analyzed with the aid of machine learning algorithms. Although RNA-seq is increasingly the technology of choice, a wealth of expression data already exist in the form of microarray data. If machine learning models built from legacy data can be applied to RNA-seq data, larger, more diverse training datasets can be created and validation can be performed on newly generated data. We developed Training Distribution Matching (TDM), which transforms RNA-seq data for use with models constructed from legacy platforms. We evaluated TDM, as well as quantile normalization, nonparanormal transformation, and a simple log2 transformation, on both simulated and biological datasets of gene expression. Our evaluation included both supervised and unsupervised machine learning approaches. We found that TDM exhibited consistently strong performance across settings and that quantile normalization also performed well in many circumstances. We also provide a TDM package for the R programming language. PMID:26844019

  5. Comparison of software packages for detecting differential expression in RNA-seq studies

    PubMed Central

    Seyednasrollah, Fatemeh; Laiho, Asta

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. PMID:24300110

  6. Comparison of software packages for detecting differential expression in RNA-seq studies.

    PubMed

    Seyednasrollah, Fatemeh; Laiho, Asta; Elo, Laura L

    2015-01-01

    RNA-sequencing (RNA-seq) has rapidly become a popular tool to characterize transcriptomes. A fundamental research problem in many RNA-seq studies is the identification of reliable molecular markers that show differential expression between distinct sample groups. Together with the growing popularity of RNA-seq, a number of data analysis methods and pipelines have already been developed for this task. Currently, however, there is no clear consensus about the best practices yet, which makes the choice of an appropriate method a daunting task especially for a basic user without a strong statistical or computational background. To assist the choice, we perform here a systematic comparison of eight widely used software packages and pipelines for detecting differential expression between sample groups in a practical research setting and provide general guidelines for choosing a robust pipeline. In general, our results demonstrate how the data analysis tool utilized can markedly affect the outcome of the data analysis, highlighting the importance of this choice. © The Author 2013. Published by Oxford University Press.

  7. Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol

    PubMed Central

    2017-01-01

    Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence–function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence–function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the −35 RNA polymerase (RNAP) binding site and one overlapping with the −10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli. PMID:28463494

  8. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  9. Complementarity of SOMAscan to LC-MS/MS and RNA-seq for quantitative profiling of human embryonic and mesenchymal stem cells.

    PubMed

    Billing, Anja M; Ben Hamidane, Hisham; Bhagwat, Aditya M; Cotton, Richard J; Dib, Shaima S; Kumar, Pankaj; Hayat, Shahina; Goswami, Neha; Suhre, Karsten; Rafii, Arash; Graumann, Johannes

    2017-01-06

    Dynamic range limitations are challenging to proteomics, particularly in clinical samples. Affinity proteomics partially overcomes this, yet suffers from dependence on reagent quality. SOMAscan, an aptamer-based platform for over 1000 proteins, avoids that issue using nucleic acid binders. Targets include low expressed proteins not easily accessible by other approaches. Here we report on the potential of SOMAscan for the study of differently sourced mesenchymal stem cells (MSC) in comparison to LC-MS/MS and RNA sequencing. While targeting fewer analytes, SOMAscan displays high precision and dynamic range coverage, allowing quantification of proteins not measured by the other platforms. Expression between cell types (ESC and MSC) was compared across techniques and uncovered the expected large differences. Sourcing was investigated by comparing subtypes: bone marrow-derived, standard in clinical studies, and ESC-derived MSC, thought to hold similar potential but devoid of inter-donor variability and proliferating faster in vitro. We confirmed subtype-equivalency, as well as vesicle and extracellular matrix related processes in MSC. In contrast, the proliferative nature of ESC was captured less by SOMAscan, where nuclear proteins are underrepresented. The complementary of SOMAscan allowed the comprehensive exploration of CD markers and signaling molecules, not readily accessible otherwise and offering unprecedented potential in subtype characterization. Mesenchymal stem cells (MSC) represent promising stem cell-derived therapeutics as indicated by their application in >500 clinical trials currently registered with the NIH. Tissue-derived MSC require invasive harvesting and imply donor-to-donor differences, to which embryonic stem cell (ESC)-derived MSC may provide an alternative and thus warrant thorough characterization. In continuation of our previous study where we compared in depth embryonic stem cells (ESC) and MSC from two sources (bone marrow and ESC-derived), we included the aptamer-based SOMAscan assay, complementing LC-MS/MS and RNA-seq data. Furthermore, SOMAscan, a targeted proteomics platform developed for analyzing clinical samples, has been benchmarked against established analytical platforms (LC-MS/MS and RNA-seq) using stem cell comparisons as a model. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  10. Epigenomic analysis in a cell-based model reveals the roles of H3K9me3 in breast cancer transformation.

    PubMed

    Li, Qing-Lan; Lei, Pin-Ji; Zhao, Quan-Yi; Li, Lianyun; Wei, Gang; Wu, Min

    2017-08-01

    Epigenetic marks are critical regulators of chromatin and gene activity. Their roles in normal physiology and disease states, including cancer development, still remain elusive. Herein, the epigenomic change of H3K9me3, as well as its potential impacts on gene activity and genome stability, was investigated in an in vitro breast cancer transformation model. The global H3K9me3 level was studied with western blotting. The distribution of H3K9me3 on chromatin and gene expression was studied with ChIP-Seq and RNA-Seq, respectively. The global H3K9me3 level decreases during transformation and its distribution on chromatin is reprogrammed. By combining with TCGA data, we identified 67 candidate oncogenes, among which five genes are totally novel. Our analysis further links H3K9me3 with transposon activity, and suggests H3K9me3 reduction increases the cell's sensitivity to DNA damage reagents. H3K9me3 reduction is possibly related with breast cancer transformation by regulating gene expression and chromatin stability during transformation.

  11. RNA-Seq and molecular docking reveal multi-level pesticide resistance in the bed bug

    PubMed Central

    2012-01-01

    Background Bed bugs (Cimex lectularius) are hematophagous nocturnal parasites of humans that have attained high impact status due to their worldwide resurgence. The sudden and rampant resurgence of C. lectularius has been attributed to numerous factors including frequent international travel, narrower pest management practices, and insecticide resistance. Results We performed a next-generation RNA sequencing (RNA-Seq) experiment to find differentially expressed genes between pesticide-resistant (PR) and pesticide-susceptible (PS) strains of C. lectularius. A reference transcriptome database of 51,492 expressed sequence tags (ESTs) was created by combining the databases derived from de novo assembled mRNA-Seq tags (30,404 ESTs) and our previous 454 pyrosequenced database (21,088 ESTs). The two-way GLMseq analysis revealed ~15,000 highly significant differentially expressed ESTs between the PR and PS strains. Among the top 5,000 differentially expressed ESTs, 109 putative defense genes (cuticular proteins, cytochrome P450s, antioxidant genes, ABC transporters, glutathione S-transferases, carboxylesterases and acetyl cholinesterase) involved in penetration resistance and metabolic resistance were identified. Tissue and development-specific expression of P450 CYP3 clan members showed high mRNA levels in the cuticle, Malpighian tubules, and midgut; and in early instar nymphs, respectively. Lastly, molecular modeling and docking of a candidate cytochrome P450 (CYP397A1V2) revealed the flexibility of the deduced protein to metabolize a broad range of insecticide substrates including DDT, deltamethrin, permethrin, and imidacloprid. Conclusions We developed significant molecular resources for C. lectularius putatively involved in metabolic resistance as well as those participating in other modes of insecticide resistance. RNA-Seq profiles of PR strains combined with tissue-specific profiles and molecular docking revealed multi-level insecticide resistance in C. lectularius. Future research that is targeted towards RNA interference (RNAi) on the identified metabolic targets such as cytochrome P450s and cuticular proteins could lay the foundation for a better understanding of the genetic basis of insecticide resistance in C. lectularius. PMID:22226239

  12. Comprehensive RNA-Seq Expression Analysis of Sensory Ganglia with a Focus on Ion Channels and GPCRs in Trigeminal Ganglia

    PubMed Central

    Manteniotis, Stavros; Lehmann, Ramona; Flegel, Caroline; Vogel, Felix; Hofreuter, Adrian; Schreiner, Benjamin S. P.; Altmüller, Janine; Becker, Christian; Schöbel, Nicole; Hatt, Hanns; Gisselmann, Günter

    2013-01-01

    The specific functions of sensory systems depend on the tissue-specific expression of genes that code for molecular sensor proteins that are necessary for stimulus detection and membrane signaling. Using the Next Generation Sequencing technique (RNA-Seq), we analyzed the complete transcriptome of the trigeminal ganglia (TG) and dorsal root ganglia (DRG) of adult mice. Focusing on genes with an expression level higher than 1 FPKM (fragments per kilobase of transcript per million mapped reads), we detected the expression of 12984 genes in the TG and 13195 in the DRG. To analyze the specific gene expression patterns of the peripheral neuronal tissues, we compared their gene expression profiles with that of the liver, brain, olfactory epithelium, and skeletal muscle. The transcriptome data of the TG and DRG were scanned for virtually all known G-protein-coupled receptors (GPCRs) as well as for ion channels. The expression profile was ranked with regard to the level and specificity for the TG. In total, we detected 106 non-olfactory GPCRs and 33 ion channels that had not been previously described as expressed in the TG. To validate the RNA-Seq data, in situ hybridization experiments were performed for several of the newly detected transcripts. To identify differences in expression profiles between the sensory ganglia, the RNA-Seq data of the TG and DRG were compared. Among the differentially expressed genes (> 1 FPKM), 65 and 117 were expressed at least 10-fold higher in the TG and DRG, respectively. Our transcriptome analysis allows a comprehensive overview of all ion channels and G protein-coupled receptors that are expressed in trigeminal ganglia and provides additional approaches for the investigation of trigeminal sensing as well as for the physiological and pathophysiological mechanisms of pain. PMID:24260241

  13. Phytophthora megakarya and P. palmivora, Causal Agents of Black Pod Rot, Induce Similar Plant Defense Responses Late during Infection of Susceptible Cacao Pods

    PubMed Central

    Ali, Shahin S.; Shao, Jonathan; Lary, David J.; Strem, Mary D.; Meinhardt, Lyndel W.; Bailey, Bryan A.

    2017-01-01

    Phytophthora megakarya (Pmeg) and Phytophthora palmivora (Ppal) cause black pod rot of Theobroma cacao L. (cacao). Of these two clade 4 species, Pmeg is more virulent and is displacing Ppal in many cacao production areas in Africa. Symptoms and species specific sporangia production were compared when the two species were co-inoculated onto pod pieces in staggered 24 h time intervals. Pmeg sporangia were predominantly recovered from pod pieces with unwounded surfaces even when inoculated 24 h after Ppal. On wounded surfaces, sporangia of Ppal were predominantly recovered if the two species were simultaneously applied or Ppal was applied first but not if Pmeg was applied first. Pmeg demonstrated an advantage over Ppal when infecting un-wounded surfaces while Ppal had the advantage when infecting wounded surfaces. RNA-Seq was carried out on RNA isolated from control and Pmeg and Ppal infected pod pieces 3 days post inoculation to assess their abilities to alter/suppress cacao defense. Expression of 4,482 and 5,264 cacao genes was altered after Pmeg and Ppal infection, respectively, with most genes responding to both species. Neural network self-organizing map analyses separated the cacao RNA-Seq gene expression profiles into 24 classes, 6 of which were largely induced in response to infection. Using KEGG analysis, subsets of genes composing interrelated pathways leading to phenylpropanoid biosynthesis, ethylene and jasmonic acid biosynthesis and action, plant defense signal transduction, and endocytosis showed induction in response to infection. A large subset of genes encoding putative Pr-proteins also showed differential expression in response to infection. A subset of 36 cacao genes was used to validate the RNA-Seq expression data and compare infection induced gene expression patterns in leaves and wounded and unwounded pod husks. Expression patterns between RNA-Seq and RT-qPCR were generally reproducible. The level and timing of altered gene expression was influenced by the tissues studied and by wounding. Although, in these susceptible interactions gene expression patterns were similar, some genes did show differential expression in a Phytophthora species dependent manner. The biggest difference was the more intense changes in expression in Ppal inoculated wounded pod pieces further demonstrating its rapid progression when penetrating through wounds. PMID:28261234

  14. Deciphering Transcriptional Programming during Pod and Seed Development Using RNA-Seq in Pigeonpea (Cajanus cajan).

    PubMed

    Pazhamala, Lekha T; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K; Varshney, Rajeev K

    2016-01-01

    Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety "Asha" (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits.

  15. Deciphering Transcriptional Programming during Pod and Seed Development Using RNA-Seq in Pigeonpea (Cajanus cajan)

    PubMed Central

    Pazhamala, Lekha T.; Agarwal, Gaurav; Bajaj, Prasad; Kumar, Vinay; Kulshreshtha, Akanksha; Saxena, Rachit K.; Varshney, Rajeev K.

    2016-01-01

    Seed development is an important event in plant life cycle that has interested humankind since ages, especially in crops of economic importance. Pigeonpea is an important grain legume of the semi-arid tropics, used mainly for its protein rich seeds. In order to understand the transcriptional programming during the pod and seed development, RNA-seq data was generated from embryo sac from the day of anthesis (0 DAA), seed and pod wall (5, 10, 20 and 30 DAA) of pigeonpea variety “Asha” (ICPL 87119) using Illumina HiSeq 2500. About 684 million sequencing reads have been generated from nine samples, which resulted in the identification of 27,441 expressed genes after sequence analysis. These genes have been studied for their differentially expression, co-expression, temporal and spatial gene expression. We have also used the RNA-seq data to identify important seed-specific transcription factors, biological processes and associated pathways during seed development process in pigeonpea. The comprehensive gene expression study from flowering to mature pod development in pigeonpea would be crucial in identifying candidate genes involved in seed traits directly or indirectly related to yield and quality. The dataset will serve as an important resource for gene discovery and deciphering the molecular mechanisms underlying various seed related traits. PMID:27760186

  16. ChIPnorm: a statistical method for normalizing and identifying differential regions in histone modification ChIP-seq libraries.

    PubMed

    Nair, Nishanth Ulhas; Sahu, Avinash Das; Bucher, Philipp; Moret, Bernard M E

    2012-01-01

    The advent of high-throughput technologies such as ChIP-seq has made possible the study of histone modifications. A problem of particular interest is the identification of regions of the genome where different cell types from the same organism exhibit different patterns of histone enrichment. This problem turns out to be surprisingly difficult, even in simple pairwise comparisons, because of the significant level of noise in ChIP-seq data. In this paper we propose a two-stage statistical method, called ChIPnorm, to normalize ChIP-seq data, and to find differential regions in the genome, given two libraries of histone modifications of different cell types. We show that the ChIPnorm method removes most of the noise and bias in the data and outperforms other normalization methods. We correlate the histone marks with gene expression data and confirm that histone modifications H3K27me3 and H3K4me3 act as respectively a repressor and an activator of genes. Compared to what was previously reported in the literature, we find that a substantially higher fraction of bivalent marks in ES cells for H3K27me3 and H3K4me3 move into a K27-only state. We find that most of the promoter regions in protein-coding genes have differential histone-modification sites. The software for this work can be downloaded from http://lcbb.epfl.ch/software.html.

  17. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

    PubMed

    Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho

    2015-10-28

    Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.

  18. Single-cell mRNA sequencing identifies subclonal heterogeneity in anti-cancer drug responses of lung adenocarcinoma cells.

    PubMed

    Kim, Kyu-Tae; Lee, Hye Won; Lee, Hae-Ock; Kim, Sang Cheol; Seo, Yun Jee; Chung, Woosung; Eum, Hye Hyeon; Nam, Do-Hyun; Kim, Junhyong; Joo, Kyeung Min; Park, Woong-Yang

    2015-06-19

    Intra-tumoral genetic and functional heterogeneity correlates with cancer clinical prognoses. However, the mechanisms by which intra-tumoral heterogeneity impacts therapeutic outcome remain poorly understood. RNA sequencing (RNA-seq) of single tumor cells can provide comprehensive information about gene expression and single-nucleotide variations in individual tumor cells, which may allow for the translation of heterogeneous tumor cell functional responses into customized anti-cancer treatments. We isolated 34 patient-derived xenograft (PDX) tumor cells from a lung adenocarcinoma patient tumor xenograft. Individual tumor cells were subjected to single cell RNA-seq for gene expression profiling and expressed mutation profiling. Fifty tumor-specific single-nucleotide variations, including KRAS(G12D), were observed to be heterogeneous in individual PDX cells. Semi-supervised clustering, based on KRAS(G12D) mutant expression and a risk score representing expression of 69 lung adenocarcinoma-prognostic genes, classified PDX cells into four groups. PDX cells that survived in vitro anti-cancer drug treatment displayed transcriptome signatures consistent with the group characterized by KRAS(G12D) and low risk score. Single-cell RNA-seq on viable PDX cells identified a candidate tumor cell subgroup associated with anti-cancer drug resistance. Thus, single-cell RNA-seq is a powerful approach for identifying unique tumor cell-specific gene expression profiles which could facilitate the development of optimized clinical anti-cancer strategies.

  19. Differential Expression Profile of Chicken Embryo Fibroblast DF-1 Cells Infected with Cell-Adapted Infectious Bursal Disease Virus.

    PubMed

    Hui, Raymond K; Leung, Frederick C

    2015-01-01

    RNA-Seq was used to unveil the transcriptional profile of DF-1 cells at the early stage of caIBDV infection. Total RNAs were extracted from virus-infected cells at 0, 6 and 12 hpi. RNA-Seq datasets of respective samples mapped to 56.5-57.6% of isoforms in the reference genome Galgal4.73. At 6 hpi, 23 isoforms underwent an elevated expression, while 128 isoforms were up-regulated and 5 were down-regulated at 12 hpi in the virus-infected group. Besides, 10 isoforms were exclusively expressed in the virus-infected cells. Though no significant change was detected in cytokine and interferon expression levels at the first 12 hours of infection, modulations of the upstream regulators were observed. In addition to the reported regulatory factors including EIF2AK2, MX, OAS*A, GBP7 and IFIT, IBDV infection also triggered a IFIT5-IRF1/3-RSAD5 pathway in the DF-1 cells which potentially restricted the viral replication cycle in the early infection stage. Over-expression of LIPA and CH25H, together with the suppression of STARD4, LSS and AACS genes implied a modulation of membrane fluidity and lipid raft arrangement in the infected cells. Alternative splicing of the EFR3 homolog A gene was also through to be involved in the lipid membrane regulation, and these cumulative responses projected an inhibition of viral endocytosis. Recognition of viral RNA genomes and intermediates was presumably enhanced by the elevated levels of IFIH1, DHX58 and TRIM25 genes which possess properties on detecting viral dsRNA. On the other hand, the caIBDV arrested the host's apoptotic process by inducing the expression of apoptosis inhibitors including NFKBIA/Z, TNFAIP2/3 and ITA at the first 12 hours of infection. In conclusion, the differential expression landscape demonstrated with RNA-Seq provides a comprehensive picture on the molecular interactions between host cells and virus at the early stage of infection.

  20. TCL1A, a Novel Transcription Factor and a Coregulator of Nuclear Factor κB p65: Single Nucleotide Polymorphism and Estrogen Dependence.

    PubMed

    Ho, Ming-Fen; Lummertz da Rocha, Edroaldo; Zhang, Cheng; Ingle, James N; Goss, Paul E; Shepherd, Lois E; Kubo, Michiaki; Wang, Liewei; Li, Hu; Weinshilboum, Richard M

    2018-06-01

    T-cell leukemia 1A ( TCL1A ) single-nucleotide polymorphisms (SNPs) have been associated with aromatase inhibitor-induced musculoskeletal adverse events. We previously demonstrated that TCL1A is inducible by estradiol (E 2 ) and plays a critical role in the regulation of cytokines, chemokines, and Toll-like receptors in a TCL1A SNP genotype and estrogen-dependent fashion. Furthermore, TCLIA SNP-dependent expression phenotypes can be "reversed" by exposure to selective estrogen receptor modulators such as 4-hydroxytamoxifen (4OH-TAM). The present study was designed to comprehensively characterize the role of TCL1A in transcriptional regulation across the genome by performing RNA sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) assays with lymphoblastoid cell lines. RNA-seq identified 357 genes that were regulated in a TCL1A SNP- and E 2 -dependent fashion with expression patterns that were 4OH-TAM reversible. ChIP-seq for the same cells identified 57 TCL1A binding sites that could be regulated by E 2 in a SNP-dependent fashion. Even more striking, nuclear factor- κ B (NF- κ B) p65 bound to those same DNA regions. In summary, TCL1A is a novel transcription factor with expression that is regulated in a SNP- and E 2 -dependent fashion-a pattern of expression that can be reversed by 4OH-TAM. Integrated RNA-seq and ChIP-seq results suggest that TCL1A also acts as a transcriptional coregulator with NF- κ B p65, an important immune system transcription factor. Copyright © 2018 by The American Society for Pharmacology and Experimental Therapeutics.

  1. Transcriptomic and proteomic analysis reveals wall-associated and glucan-degrading proteins with potential roles in Phytophthora infestans sexual spore development.

    PubMed

    Niu, Xiaofan; Ah-Fong, Audrey M V; Lopez, Lilianna A; Judelson, Howard S

    2018-01-01

    Sexual reproduction remains an understudied feature of oomycete biology. To expand our knowledge of this process, we used RNA-seq and quantitative proteomics to examine matings in Phytophthora infestans. Exhibiting significant changes in mRNA abundance in three matings between different A1 and A2 strains compared to nonmating controls were 1170 genes, most being mating-induced. Rising by >10-fold in at least one cross were 455 genes, and 182 in all three crosses. Most genes had elevated expression in a self-fertile strain. Many mating-induced genes were associated with cell wall biosynthesis, which may relate to forming the thick-walled sexual spore (oospore). Several gene families were induced during mating including one encoding histidine, serine, and tyrosine-rich putative wall proteins, and another encoding prolyl hydroxylases which may strengthen the extracellular matrix. The sizes of these families vary >10-fold between Phytophthora species and one exhibits concerted evolution, highlighting two features of genome dynamics within the genus. Proteomic analyses of mature oospores and nonmating hyphae using isobaric tags for quantification identified 835 shared proteins, with 5% showing >2-fold changes in abundance between the tissues. Enriched in oospores were β-glucanases potentially involved in digesting the oospore wall during germination. Despite being dormant, oospores contained a mostly normal complement of proteins required for core cellular functions. The RNA-seq data generated here and in prior studies were used to identify new housekeeping controls for gene expression studies that are more stable than existing normalization standards. We also observed >2-fold variation in the fraction of polyA+ RNA between life stages, which should be considered when quantifying transcripts and may also be relevant to understanding translational control during development.

  2. RNA-Seq reveals MicroRNA expression signature and genetic polymorphism associated with growth and muscle quality traits in rainbow trout

    USDA-ARS?s Scientific Manuscript database

    The role of microRNA expression and genetic variation in microRNA-binding sites of target genes on growth and muscle quality traits is poorly characterized. We used RNA-Seq approach to investigate their importance on 5 growth and muscle quality traits: whole body weight (WBW), muscle yield, muscle c...

  3. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space

    PubMed Central

    Karnik, Rahul; Beer, Michael A.

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs. PMID:26465884

  4. Identification of Predictive Cis-Regulatory Elements Using a Discriminative Objective Function and a Dynamic Search Space.

    PubMed

    Karnik, Rahul; Beer, Michael A

    2015-01-01

    The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-the-art computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called MotifSpec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

  5. The destiny of the resistance/susceptibility against GCRV is controlled by epigenetic mechanisms in CIK cells.

    PubMed

    Shang, Xueying; Yang, Chunrong; Wan, Quanyuan; Rao, Youliang; Su, Jianguo

    2017-07-03

    Hemorrhagic disease caused by grass carp reovirus (GCRV) has severely threatened the grass carp (Ctenopharyngodon idella) cultivation industry. It is noteworthy that the resistance against GCRV infection was reported to be inheritable, and identified at both individual and cellular levels. Therefore, this work was inspired and dedicated to unravel the molecular mechanisms of fate decision post GCRV infection in related immune cells. Foremost, the resistant and susceptible CIK (C. idella kidney) monoclonal cells were established by single cell sorting, subculturing and infection screening successively. RNA-Seq, MeDIP-Seq and small RNA-Seq were carried out with C1 (CIK cells), R2 (resistant cells) and S3 (susceptible cells) groups. It was demonstrated that genome-wide DNA methylation, mRNA and microRNA expression levels in S3 were the highest among three groups. Transcriptome analysis elucidated that pathways associated with antioxidant activity, cell proliferation regulation, apoptosis activity and energy consuming might contribute to the decision of cell fates post infection. And a series of immune-related genes were identified differentially expressed across resistant and susceptible groups, which were negatively modulated by DNA methylation or microRNAs. To conclude, this study systematically uncovered the regulatory mechanism on the resistance from epigenetic perspective and provided potential biomarkers for future studies on resistance breeding.

  6. RNA-Seq Transcriptome Profiling of Upland Cotton (Gossypium hirsutum L.) Root Tissue under Water-Deficit Stress

    PubMed Central

    Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd

    2013-01-01

    An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815

  7. A highly sensitive and accurate gene expression analysis by sequencing ("bead-seq") for a single cell.

    PubMed

    Matsunaga, Hiroko; Goto, Mari; Arikawa, Koji; Shirai, Masataka; Tsunoda, Hiroyuki; Huang, Huan; Kambara, Hideki

    2015-02-15

    Analyses of gene expressions in single cells are important for understanding detailed biological phenomena. Here, a highly sensitive and accurate method by sequencing (called "bead-seq") to obtain a whole gene expression profile for a single cell is proposed. A key feature of the method is to use a complementary DNA (cDNA) library on magnetic beads, which enables adding washing steps to remove residual reagents in a sample preparation process. By adding the washing steps, the next steps can be carried out under the optimal conditions without losing cDNAs. Error sources were carefully evaluated to conclude that the first several steps were the key steps. It is demonstrated that bead-seq is superior to the conventional methods for single-cell gene expression analyses in terms of reproducibility, quantitative accuracy, and biases caused during sample preparation and sequencing processes. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Transcriptomic and epigenomic characterization of the developing bat wing

    PubMed Central

    Eckalbar, Walter L.; Schlebusch, Stephen A.; Mason, Mandy K.; Gill, Zoe; Parker, Ash V.; Booker, Betty M.; Nishizaki, Sierra; Muswamba-Nday, Christiane; Terhune, Elizabeth; Nevonen, Kimberly; Makki, Nadja; Friedrich, Tara; VanderMeer, Julia E.; Pollard, Katherine S.; Carbone, Lucia; Wall, Jeff D.; Illing, Nicola; Ahituv, Nadav

    2016-01-01

    Bats are the only mammals capable of powered flight, but little is known about the genetic determinants that shape their wings. Here, we generated a genome for Miniopterus natalensis and performed RNA-seq and ChIP-seq (H3K27ac, H3K27me3) on its developing forelimb and hindlimb autopods at sequential embryonic stages to decipher the molecular events that underlie bat wing development. Over 7,000 genes and several lncRNAs, including Tbx5-as1 and Hottip, were differentially expressed between forelimb, hindlimb and different stages. ChIP-seq identified thousands of regions that are differentially modified in forelimb versus hindlimb. Comparative genomics found 2,796 bat-accelerated regions within H3K27ac peaks, several of which cluster near limb-associated genes. Pathway analyses revealed multiple ribosomal proteins and known limb patterning signaling pathways as differentially regulated, and implicated increased forelimb mesenchymal condensations with differential growth. Combined, our work outlines multiple genetic components that contribute to bat wing formation, providing a genomic blueprint for this morphological innovation. PMID:27019111

  9. A network approach of gene co-expression in the zea mays/Aspergillus flavus pathosystem to map host/pathogen interaction pathways

    USDA-ARS?s Scientific Manuscript database

    A gene co-expression network was generated using a dual RNA-seq study with the fungal pathogen A. flavus and its plant host Z. mays during the initial 3 days of infection. The analysis deciphered novel pathways and mapped genes of interest in both organisms during the infection. This network reveal...

  10. Transcriptome-wide selection of a reliable set of reference genes for gene expression studies in potato cyst nematodes (Globodera spp.).

    PubMed

    Sabeh, Michael; Duceppe, Marc-Olivier; St-Arnaud, Marc; Mimee, Benjamin

    2018-01-01

    Relative gene expression analyses by qRT-PCR (quantitative reverse transcription PCR) require an internal control to normalize the expression data of genes of interest and eliminate the unwanted variation introduced by sample preparation. A perfect reference gene should have a constant expression level under all the experimental conditions. However, the same few housekeeping genes selected from the literature or successfully used in previous unrelated experiments are often routinely used in new conditions without proper validation of their stability across treatments. The advent of RNA-Seq and the availability of public datasets for numerous organisms are opening the way to finding better reference genes for expression studies. Globodera rostochiensis is a plant-parasitic nematode that is particularly yield-limiting for potato. The aim of our study was to identify a reliable set of reference genes to study G. rostochiensis gene expression. Gene expression levels from an RNA-Seq database were used to identify putative reference genes and were validated with qRT-PCR analysis. Three genes, GR, PMP-3, and aaRS, were found to be very stable within the experimental conditions of this study and are proposed as reference genes for future work.

  11. Quantification of protein expression in cells and cellular subcompartments on immunohistochemical sections using a computer supported image analysis system.

    PubMed

    Braun, Martin; Kirsten, Robert; Rupp, Niels J; Moch, Holger; Fend, Falko; Wernert, Nicolas; Kristiansen, Glen; Perner, Sven

    2013-05-01

    Quantification of protein expression based on immunohistochemistry (IHC) is an important step for translational research and clinical routine. Several manual ('eyeballing') scoring systems are used in order to semi-quantify protein expression based on chromogenic intensities and distribution patterns. However, manual scoring systems are time-consuming and subject to significant intra- and interobserver variability. The aim of our study was to explore, whether new image analysis software proves to be sufficient as an alternative tool to quantify protein expression. For IHC experiments, one nucleus specific marker (i.e., ERG antibody), one cytoplasmic specific marker (i.e., SLC45A3 antibody), and one marker expressed in both compartments (i.e., TMPRSS2 antibody) were chosen. Stainings were applied on TMAs, containing tumor material of 630 prostate cancer patients. A pathologist visually quantified all IHC stainings in a blinded manner, applying a four-step scoring system. For digital quantification, image analysis software (Tissue Studio v.2.1, Definiens AG, Munich, Germany) was applied to obtain a continuous spectrum of average staining intensity. For each of the three antibodies we found a strong correlation of the manual protein expression score and the score of the image analysis software. Spearman's rank correlation coefficient was 0.94, 0.92, and 0.90 for ERG, SLC45A3, and TMPRSS2, respectively (p⟨0.01). Our data suggest that the image analysis software Tissue Studio is a powerful tool for quantification of protein expression in IHC stainings. Further, since the digital analysis is precise and reproducible, computer supported protein quantification might help to overcome intra- and interobserver variability and increase objectivity of IHC based protein assessment.

  12. Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction.

    PubMed

    Schmidt, Florian; Gasparoni, Nina; Gasparoni, Gilles; Gianmoena, Kathrin; Cadenas, Cristina; Polansky, Julia K; Ebert, Peter; Nordström, Karl; Barann, Matthias; Sinha, Anupam; Fröhler, Sebastian; Xiong, Jieyi; Dehghani Amirabad, Azim; Behjati Ardakani, Fatemeh; Hutter, Barbara; Zipprich, Gideon; Felder, Bärbel; Eils, Jürgen; Brors, Benedikt; Chen, Wei; Hengstler, Jan G; Hamann, Alf; Lengauer, Thomas; Rosenstiel, Philip; Walter, Jörn; Schulz, Marcel H

    2017-01-09

    The binding and contribution of transcription factors (TF) to cell specific gene expression is often deduced from open-chromatin measurements to avoid costly TF ChIP-seq assays. Thus, it is important to develop computational methods for accurate TF binding prediction in open-chromatin regions (OCRs). Here, we report a novel segmentation-based method, TEPIC, to predict TF binding by combining sets of OCRs with position weight matrices. TEPIC can be applied to various open-chromatin data, e.g. DNaseI-seq and NOMe-seq. Additionally, Histone-Marks (HMs) can be used to identify candidate TF binding sites. TEPIC computes TF affinities and uses open-chromatin/HM signal intensity as quantitative measures of TF binding strength. Using machine learning, we find low affinity binding sites to improve our ability to explain gene expression variability compared to the standard presence/absence classification of binding sites. Further, we show that both footprints and peaks capture essential TF binding events and lead to a good prediction performance. In our application, gene-based scores computed by TEPIC with one open-chromatin assay nearly reach the quality of several TF ChIP-seq data sets. Finally, these scores correctly predict known transcriptional regulators as illustrated by the application to novel DNaseI-seq and NOMe-seq data for primary human hepatocytes and CD4+ T-cells, respectively. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets

    PubMed Central

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas

    2018-01-01

    Abstract ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. PMID:29149270

  14. 40 CFR 131.3 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Protection of Environment ENVIRONMENTAL PROTECTION AGENCY (CONTINUED) WATER PROGRAMS WATER QUALITY STANDARDS... (33 U.S.C. 1251 et seq.)). (b) Criteria are elements of State water quality standards, expressed as constituent concentrations, levels, or narrative statements, representing a quality of water that supports a...

  15. The Role of Genome Accessibility in Transcription Factor Binding in Bacteria.

    PubMed

    Gomes, Antonio L C; Wang, Harris H

    2016-04-01

    ChIP-seq enables genome-scale identification of regulatory regions that govern gene expression. However, the biological insights generated from ChIP-seq analysis have been limited to predictions of binding sites and cooperative interactions. Furthermore, ChIP-seq data often poorly correlate with in vitro measurements or predicted motifs, highlighting that binding affinity alone is insufficient to explain transcription factor (TF)-binding in vivo. One possibility is that binding sites are not equally accessible across the genome. A more comprehensive biophysical representation of TF-binding is required to improve our ability to understand, predict, and alter gene expression. Here, we show that genome accessibility is a key parameter that impacts TF-binding in bacteria. We developed a thermodynamic model that parameterizes ChIP-seq coverage in terms of genome accessibility and binding affinity. The role of genome accessibility is validated using a large-scale ChIP-seq dataset of the M. tuberculosis regulatory network. We find that accounting for genome accessibility led to a model that explains 63% of the ChIP-seq profile variance, while a model based in motif score alone explains only 35% of the variance. Moreover, our framework enables de novo ChIP-seq peak prediction and is useful for inferring TF-binding peaks in new experimental conditions by reducing the need for additional experiments. We observe that the genome is more accessible in intergenic regions, and that increased accessibility is positively correlated with gene expression and anti-correlated with distance to the origin of replication. Our biophysically motivated model provides a more comprehensive description of TF-binding in vivo from first principles towards a better representation of gene regulation in silico, with promising applications in systems biology.

  16. Analysis of Strand-Specific RNA-Seq Data Using Machine Learning Reveals the Structures of Transcription Units in Clostridium thermocellum

    DOE PAGES

    Chou, Wen-Chi; Ma, Qin; Yang, Shihui; ...

    2015-03-12

    The identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets.more » Moreover, among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available athttps://code.google.com/p/seqtu/. We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.« less

  17. Boiler: lossy compression of RNA-seq alignments using coverage vectors

    PubMed Central

    Pritt, Jacob; Langmead, Ben

    2016-01-01

    We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. PMID:27298258

  18. Improving RNA-Seq expression estimates by correcting for fragment bias

    PubMed Central

    2011-01-01

    The biochemistry of RNA-Seq library preparation results in cDNA fragments that are not uniformly distributed within the transcripts they represent. This non-uniformity must be accounted for when estimating expression levels, and we show how to perform the needed corrections using a likelihood based approach. We find improvements in expression estimates as measured by correlation with independently performed qRT-PCR and show that correction of bias leads to improved replicability of results across libraries and sequencing technologies. PMID:21410973

  19. Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

    PubMed

    Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid

    2017-02-02

    Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.

  20. Oxycodone Self-Administration Induces Alterations in Expression of Integrin, Semaphorin and Ephrin Genes in the Mouse Striatum.

    PubMed

    Yuferov, Vadim; Zhang, Yong; Liang, Yupu; Zhao, Connie; Randesi, Matthew; Kreek, Mary J

    2018-01-01

    Oxycodone is one a commonly used medication for pain, and is also a widely abused prescription opioid, like other short-acting MOPr agonists. Neurochemical and structural adaptations in brain following chronic MOPr-agonist administration are thought to underlie pathogenesis and persistence of opiate addiction. Many axon guidance molecules, such as integrins, semaphorins, and ephrins may contribute to oxycodone-induced neuroadaptations through alterations in axon-target connections and synaptogenesis, that may be implicated in the behaviors associated with opiate addiction. However, little is known about this important area. The aim of this study is to investigate alterations in expression of selected integrin, semaphorin, ephrins, netrin, and slit genes in the nucleus accumbens (NAc) and caudate putamen (CPu) of mice following extended 14-day oxycodone self-administration (SA), using RNAseq. Methods: Total RNA from the NAc and CPu were isolated from adult male C57BL/6J mice within 1 h after the last session of oxycodone in a 14-day self-administration paradigm (4h/day, 0.25 mg/kg/infusion, FR1) or from yoked saline controls. Gene expressions were examined using RNA sequencing (RNA-Seq) technology. RNA-Seq libraries were prepared using Illumina's TruSeq® Stranded Total RNA LT kit. The reads were aligned to the mouse reference genome (version mm10) using STAR. DESeq2 was applied to the counts of protein coding genes to estimate the fold change between the treatment groups. False Discovery Rate (FDR) q < 0.1 were used to select genes that have a significant expression change. For selection of a subset of genes related to axon guidance pathway, REACTOME was used. Results: Among 38 known genes of the integrin, semaphorin, and ephrin gene families, RNA-seq data revealed up-regulation of six genes in the NAc: heterodimer receptor, integrins Itgal, Itgb2 , and Itgam , and its ligand semaphorin Sema7a , two semaphorin receptors, plexins Plxnd1 and Plxdc1 . There was down-regulation of eight genes in this region: two integrin genes Itga3 and Itgb8 , semaphorins Sema3c, Sema4g, Sema6a, Sema6d , semaphorin receptor neuropilin Nrp2 , and ephrin receptor Epha3 . In the CPu, there were five differentially expressed axon guidance genes: up-regulation of three integrin genes, Itgal, Itgb2, Itga1 , and down-regulation of Itga9 and ephrin Efna3 were thus observed. No significant alterations in expression of Netrin-1 or Slit were observed. Conclusion: We provide evidence for alterations in the expression of selective axon guidance genes in adult mouse brain following chronic self-administration of oxycodone. Further examination of oxycodone-induced changes in the expression of these specific axon guidance molecules and integrin genes in relation to behavior may provide new insights into development of addiction to oxycodone.

  1. Cost analysis of whole genome sequencing in German clinical practice.

    PubMed

    Plöthner, Marika; Frank, Martin; von der Schulenburg, J-Matthias Graf

    2017-06-01

    Whole genome sequencing (WGS) is an emerging tool in clinical diagnostics. However, little has been said about its procedure costs, owing to a dearth of related cost studies. This study helps fill this research gap by analyzing the execution costs of WGS within the setting of German clinical practice. First, to estimate costs, a sequencing process related to clinical practice was undertaken. Once relevant resources were identified, a quantification and monetary evaluation was conducted using data and information from expert interviews with clinical geneticists, and personnel at private enterprises and hospitals. This study focuses on identifying the costs associated with the standard sequencing process, and the procedure costs for a single WGS were analyzed on the basis of two sequencing platforms-namely, HiSeq 2500 and HiSeq Xten, both by Illumina, Inc. In addition, sensitivity analyses were performed to assess the influence of various uses of sequencing platforms and various coverage values on a fixed-cost degression. In the base case scenario-which features 80 % utilization and 30-times coverage-the cost of a single WGS analysis with the HiSeq 2500 was estimated at €3858.06. The cost of sequencing materials was estimated at €2848.08; related personnel costs of €396.94 and acquisition/maintenance costs (€607.39) were also found. In comparison, the cost of sequencing that uses the latest technology (i.e., HiSeq Xten) was approximately 63 % cheaper, at €1411.20. The estimated costs of WGS currently exceed the prediction of a 'US$1000 per genome', by more than a factor of 3.8. In particular, the material costs in themselves exceed this predicted cost.

  2. Getting the most out of RNA-seq data analysis.

    PubMed

    Khang, Tsung Fei; Lau, Ching Yee

    2015-01-01

    Background. A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists. Results. Using two large public RNA-seq data sets-one representing strong, and another mild, biological effect size-we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods'. Conclusion. When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.

  3. Identification of reference genes for quantitative expression analysis using large-scale RNA-seq data of Arabidopsis thaliana and model crop plants.

    PubMed

    Kudo, Toru; Sasaki, Yohei; Terashima, Shin; Matsuda-Imai, Noriko; Takano, Tomoyuki; Saito, Misa; Kanno, Maasa; Ozaki, Soichi; Suwabe, Keita; Suzuki, Go; Watanabe, Masao; Matsuoka, Makoto; Takayama, Seiji; Yano, Kentaro

    2016-10-13

    In quantitative gene expression analysis, normalization using a reference gene as an internal control is frequently performed for appropriate interpretation of the results. Efforts have been devoted to exploring superior novel reference genes using microarray transcriptomic data and to evaluating commonly used reference genes by targeting analysis. However, because the number of specifically detectable genes is totally dependent on probe design in the microarray analysis, exploration using microarray data may miss some of the best choices for the reference genes. Recently emerging RNA sequencing (RNA-seq) provides an ideal resource for comprehensive exploration of reference genes since this method is capable of detecting all expressed genes, in principle including even unknown genes. We report the results of a comprehensive exploration of reference genes using public RNA-seq data from plants such as Arabidopsis thaliana (Arabidopsis), Glycine max (soybean), Solanum lycopersicum (tomato) and Oryza sativa (rice). To select reference genes suitable for the broadest experimental conditions possible, candidates were surveyed by the following four steps: (1) evaluation of the basal expression level of each gene in each experiment; (2) evaluation of the expression stability of each gene in each experiment; (3) evaluation of the expression stability of each gene across the experiments; and (4) selection of top-ranked genes, after ranking according to the number of experiments in which the gene was expressed stably. Employing this procedure, 13, 10, 12 and 21 top candidates for reference genes were proposed in Arabidopsis, soybean, tomato and rice, respectively. Microarray expression data confirmed that the expression of the proposed reference genes under broad experimental conditions was more stable than that of commonly used reference genes. These novel reference genes will be useful for analyzing gene expression profiles across experiments carried out under various experimental conditions.

  4. A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data

    PubMed Central

    Skelly, Daniel A.; Johansson, Marnie; Madeoy, Jennifer; Wakefield, Jon; Akey, Joshua M.

    2011-01-01

    Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes. PMID:21873452

  5. Scanning of Transposable Elements and Analyzing Expression of Transposase Genes of Sweet Potato [Ipomoea batatas

    PubMed Central

    Tao, Xiang; Lai, Xian-Jun; Zhang, Yi-Zheng; Tan, Xue-Mei; Wang, Haiyan

    2014-01-01

    Background Transposable elements (TEs) are the most abundant genomic components in eukaryotes and affect the genome by their replications and movements to generate genetic plasticity. Sweet potato performs asexual reproduction generally and the TEs may be an important genetic factor for genome reorganization. Complete identification of TEs is essential for the study of genome evolution. However, the TEs of sweet potato are still poorly understood because of its complex hexaploid genome and difficulty in genome sequencing. The recent availability of the sweet potato transcriptome databases provides an opportunity for discovering and characterizing the expressed TEs. Methodology/Principal Findings We first established the integrated-transcriptome database by de novo assembling four published sweet potato transcriptome databases from three cultivars in China. Using sequence-similarity search and analysis, a total of 1,405 TEs including 883 retrotransposons and 522 DNA transposons were predicted and categorized. Depending on mapping sets of RNA-Seq raw short reads to the predicted TEs, we compared the quantities, classifications and expression activities of TEs inter- and intra-cultivars. Moreover, the differential expressions of TEs in seven tissues of Xushu 18 cultivar were analyzed by using Illumina digital gene expression (DGE) tag profiling. It was found that 417 TEs were expressed in one or more tissues and 107 in all seven tissues. Furthermore, the copy number of 11 transposase genes was determined to be 1–3 copies in the genome of sweet potato by Real-time PCR-based absolute quantification. Conclusions/Significance Our result provides a new method for TE searching on species with transcriptome sequences while lacking genome information. The searching, identification and expression analysis of TEs will provide useful TE information in sweet potato, which are valuable for the further studies of TE-mediated gene mutation and optimization in asexual reproduction. It contributes to elucidating the roles of TEs in genome evolution. PMID:24608103

  6. Experimental Design and Power Calculation for RNA-seq Experiments.

    PubMed

    Wu, Zhijin; Wu, Hao

    2016-01-01

    Power calculation is a critical component of RNA-seq experimental design. The flexibility of RNA-seq experiment and the wide dynamic range of transcription it measures make it an attractive technology for whole transcriptome analysis. These features, in addition to the high dimensionality of RNA-seq data, bring complexity in experimental design, making an analytical power calculation no longer realistic. In this chapter we review the major factors that influence the statistical power of detecting differential expression, and give examples of power assessment using the R package PROPER.

  7. Oncoprotein protein kinase

    DOEpatents

    Karin, Michael; Hibi, Masahiko; Lin, Anning

    2002-01-29

    The present invention provides an isolated polynucleotide encoding a c-Jun peptide consisting of about amino acid residues 33 to 79 as set fort in SEQ ID NO: 10 or conservative variations thereof. The invention also provides a method for producing a peptide of SEQ ID NO:1 comprising (a) culturing a host cell containing a polynucleotide encoding a c-Jun peptide consisting of about amino acid residues 33 to 79 as set forth in SEQ ID NO: 10 under conditions which allow expression of the polynucleotide; and (b) obtaining the peptide of SEQ ID NO:1.

  8. Quantification of EVI1 transcript levels in acute myeloid leukemia by RT-qPCR analysis: A study by the ALFA Group.

    PubMed

    Smol, Thomas; Nibourel, Olivier; Marceau-Renaut, Alice; Celli-Lebras, Karine; Berthon, Céline; Quesnel, Bruno; Boissel, Nicolas; Terré, Christine; Thomas, Xavier; Castaigne, Sylvie; Dombret, Hervé; Preudhomme, Claude; Renneville, Aline

    2015-12-01

    EVI1 overexpression confers poor prognosis in acute myeloid leukemia (AML). Quantification of EVI1 expression has been mainly assessed by real-time quantitative PCR (RT-qPCR) based on relative quantification of EVI1-1D splice variant. In this study, we developed a RT-qPCR assay to perform quantification of EVI1 expression covering the different splice variants. A sequence localized in EVI1 exons 14 and 15 was cloned into plasmids that were used to establish RT-qPCR standard curves. Threshold values to define EVI1 overexpression were determined using 17 bone marrow (BM) and 31 peripheral blood (PB) control samples and were set at 1% in BM and 0.5% in PB. Samples from 64 AML patients overexpressing EVI1 included in the ALFA-0701 or -0702 trials were collected at diagnosis and during follow-up (n=152). Median EVI1 expression at AML diagnosis was 23.3% in BM and 3.6% in PB. EVI1 expression levels significantly decreased between diagnostic and post-induction samples, with an average variation from 21.6% to 3.56% in BM and from 4.0% to 0.22% in PB, but did not exceed 1 log10 reduction. Our study demonstrates that the magnitude of reduction in EVI1 expression levels between AML diagnosis and follow-up is not sufficient to allow sensitive detection of minimal residual disease. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Male- and Female-Biased Gene Expression of Olfactory-Related Genes in the Antennae of Asian Corn Borer, Ostrinia furnacalis (Guenée) (Lepidoptera: Crambidae)

    PubMed Central

    Zhang, Tiantao; Coates, Brad S.; Ge, Xing; Bai, Shuxiong; He, Kanglai; Wang, Zhenying

    2015-01-01

    The Asian corn borer (ACB), Ostrinia furnacalis (Guenée), is a destructive pest insect of cultivated corn crops, for which antennal-expressed receptors are important to detect olfactory cues for mate attraction and oviposition. Few olfactory related genes were reported in ACB, so we sequenced and characterized the transcriptome of male and female O. furnacalis antennae. Non-normalized male and female O. furnacalis antennal cDNA libraries were sequenced on the Illumina HiSeq 2000 and assembled into a reference transcriptome. Functional gene annotations identified putative olfactory-related genes; 56 odorant receptors (ORs), 23 odorant binding proteins (OBPs), and 10 CSPs. RNA-seq estimates of gene expression respectively showed up- and down-regulation of 79 and 30 genes in female compared to male antennae, which included up-regulation of 8 ORs and 1 PBP gene in male antennae as well as 3 ORs in female antennae. Quantitative real-time RT-PCR analyses validated strong male antennal-biased expression of OfurOR3, 4, 6, 7, 8, 11, 12, 13 and 14 transcripts, whereas OfurOR17 and 18 were specially expressed in female antennae. Sex-biases gene expression described here provides important insight in gene functionalization, and provides candidate genes putatively involved in environmental perception, host plant attraction, and mate recognition. PMID:26062030

  10. Quantitative analysis of ChIP-seq data uncovers dynamic and sustained H3K4me3 and H3K27me3 modulation in cancer cells under hypoxia.

    PubMed

    Adriaens, Michiel E; Prickaerts, Peggy; Chan-Seng-Yue, Michelle; van den Beucken, Twan; Dahlmans, Vivian E H; Eijssen, Lars M; Beck, Timothy; Wouters, Bradly G; Voncken, Jan Willem; Evelo, Chris T A

    2016-01-01

    A comprehensive assessment of the epigenetic dynamics in cancer cells is the key to understanding the molecular mechanisms underlying cancer and to improving cancer diagnostics, prognostics and treatment. By combining genome-wide ChIP-seq epigenomics and microarray transcriptomics, we studied the effects of oxygen deprivation and subsequent reoxygenation on histone 3 trimethylation of lysine 4 (H3K4me3) and lysine 27 (H3K27me3) in a breast cancer cell line, serving as a model for abnormal oxygenation in solid tumors. A priori, epigenetic markings and gene expression levels not only are expected to vary greatly between hypoxic and normoxic conditions, but also display a large degree of heterogeneity across the cell population. Where traditionally ChIP-seq data are often treated as dichotomous data, the model and experiment here necessitate a quantitative, data-driven analysis of both datasets. We first identified genomic regions with sustained epigenetic markings, which provided a sample-specific reference enabling quantitative ChIP-seq data analysis. Sustained H3K27me3 marking was located around centromeres and intergenic regions, while sustained H3K4me3 marking is associated with genes involved in RNA binding, translation and protein transport and localization. Dynamic marking with both H3K4me3 and H3K27me3 (hypoxia-induced bivalency) was found in CpG-rich regions at loci encoding factors that control developmental processes, congruent with observations in embryonic stem cells. In silico -identified epigenetically sustained and dynamic genomic regions were confirmed through ChIP-PCR in vitro, and obtained results are corroborated by published data and current insights regarding epigenetic regulation.

  11. Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data.

    PubMed

    Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R

    2017-11-02

    Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

    PubMed Central

    Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong

    2017-01-01

    Abstract Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. PMID:29036714

  13. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq

    PubMed Central

    Shepard, Peter J.; Choi, Eun-A; Lu, Jente; Flanagan, Lisa A.; Hertel, Klemens J.; Shi, Yongsheng

    2011-01-01

    Alternative polyadenylation (APA) of mRNAs has emerged as an important mechanism for post-transcriptional gene regulation in higher eukaryotes. Although microarrays have recently been used to characterize APA globally, they have a number of serious limitations that prevents comprehensive and highly quantitative analysis. To better characterize APA and its regulation, we have developed a deep sequencing-based method called Poly(A) Site Sequencing (PAS-Seq) for quantitatively profiling RNA polyadenylation at the transcriptome level. PAS-Seq not only accurately and comprehensively identifies poly(A) junctions in mRNAs and noncoding RNAs, but also provides quantitative information on the relative abundance of polyadenylated RNAs. PAS-Seq analyses of human and mouse transcriptomes showed that 40%–50% of all expressed genes produce alternatively polyadenylated mRNAs. Furthermore, our study detected evolutionarily conserved polyadenylation of histone mRNAs and revealed novel features of mitochondrial RNA polyadenylation. Finally, PAS-Seq analyses of mouse embryonic stem (ES) cells, neural stem/progenitor (NSP) cells, and neurons not only identified more poly(A) sites than what was found in the entire mouse EST database, but also detected significant changes in the global APA profile that lead to lengthening of 3′ untranslated regions (UTR) in many mRNAs during stem cell differentiation. Together, our PAS-Seq analyses revealed a complex landscape of RNA polyadenylation in mammalian cells and the dynamic regulation of APA during stem cell differentiation. PMID:21343387

  14. It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR.

    PubMed

    Lun, Aaron T L; Chen, Yunshun; Smyth, Gordon K

    2016-01-01

    RNA sequencing (RNA-seq) is widely used to profile transcriptional activity in biological systems. Here we present an analysis pipeline for differential expression analysis of RNA-seq experiments using the Rsubread and edgeR software packages. The basic pipeline includes read alignment and counting, filtering and normalization, modelling of biological variability and hypothesis testing. For hypothesis testing, we describe particularly the quasi-likelihood features of edgeR. Some more advanced downstream analysis steps are also covered, including complex comparisons, gene ontology enrichment analyses and gene set testing. The code required to run each step is described, along with an outline of the underlying theory. The chapter includes a case study in which the pipeline is used to study the expression profiles of mammary gland cells in virgin, pregnant and lactating mice.

  15. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium

    PubMed Central

    2014-01-01

    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. PMID:25150838

  16. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

    PubMed

    Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

    2015-11-15

    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.

  17. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    PubMed Central

    Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

    2015-01-01

    Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307

  18. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE PAGES

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai; ...

    2015-10-28

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  19. Strand-Specific RNA-Seq Analyses of Fruiting Body Development in Coprinopsis cinerea

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muraguchi, Hajime; Umezawa, Kiwamu; Niikura, Mai

    We report that the basidiomycete fungus Coprinopsis cinerea is an important model system for multicellular development. Fruiting bodies of C. cinerea are typical mushrooms, which can be produced synchronously on defined media in the laboratory. To investigate the transcriptome in detail during fruiting body development, high-throughput sequencing (RNA-seq) was performed using cDNA libraries strand-specifically constructed from 13 points (stages/tissues) with two biological replicates. The reads were aligned to 14,245 predicted transcripts, and counted for forward and reverse transcripts. Differentially expressed genes (DEGs) between two adjacent points and between vegetative mycelium and each point were detected by Tag Count Comparison (TCC).more » To validate RNA-seq data, expression levels of selected genes were compared using RPKM values in RNA-seq data and qRT-PCR data, and DEGs detected in microarray data were examined in MA plots of RNA-seq data by TCC. We discuss events deduced from GO analysis of DEGs. In addition, we uncovered both transcription factor candidates and antisense transcripts that are likely to be involved in developmental regulation for fruiting.« less

  20. Differential responsiveness of Holstein and Angus dermal fibroblasts to LPS challenge occurs without major differences in the methylome.

    PubMed

    Benjamin, Aimee L; Green, Benjamin B; Crooker, Brian A; McKay, Stephanie D; Kerr, David E

    2016-03-24

    We have previously found substantial animal-to-animal and age-dependent variation in the response of Holstein fibroblast cultures challenged with LPS. To expand on this finding, fibroblast cultures were established from dairy (Holstein) and beef (Angus) cattle and challenged with LPS to examine breed-dependent differences in the innate immune response. Global gene expression was measured by RNA-Seq, while an epigenetic basis for expression differences was examined by methylated CpG island recovery assay sequencing (MIRA-Seq) analysis. The Holstein breed displayed a more robust response to LPS than the Angus breed based on RNA-Seq analysis of cultures challenged with LPS for 0, 2, and 8 h. Several immune-associated genes were expressed at greater levels (FDR < 0.05) in Holstein cultures including TLR4 at all time points and a number of pro-inflammatory genes such as IL8, CCL20, CCL5, and TNF following LPS exposure. Despite extensive breed differences in the transcriptome, MIRA-Seq unveiled relatively similar patterns of genome-wide DNA methylation between breeds, with an overall hypomethylation of gene promoters. However, by examining the genome in 3Kb windows, 49 regions of differential methylation were discovered between Holstein and Angus fibroblasts, and two of these regions fell within the promoter region (-2500 to +500 bp of the transcription start site) of the genes NTRK2 and ADAMTS5. Fibroblasts isolated from Holstein cattle display a more robust response to LPS in comparison to cultures from Angus cattle. Different selection strategies and management practices exist between these two breeds that likely give rise to genetic and epigenetic factors contributing to the different immune response phenotypes.

  1. mQTL-seq delineates functionally relevant candidate gene harbouring a major QTL regulating pod number in chickpea

    PubMed Central

    Das, Shouvik; Singh, Mohar; Srivastava, Rishi; Bajaj, Deepak; Saxena, Maneesha S.; Rana, Jai C.; Bansal, Kailash C.; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    The present study used a whole-genome, NGS resequencing-based mQTL-seq (multiple QTL-seq) strategy in two inter-specific mapping populations (Pusa 1103 × ILWC 46 and Pusa 256 × ILWC 46) to scan the major genomic region(s) underlying QTL(s) governing pod number trait in chickpea. Essentially, the whole-genome resequencing of low and high pod number-containing parental accessions and homozygous individuals (constituting bulks) from each of these two mapping populations discovered >8 million high-quality homozygous SNPs with respect to the reference kabuli chickpea. The functional significance of the physically mapped SNPs was apparent from the identified 2,264 non-synonymous and 23,550 regulatory SNPs, with 8–10% of these SNPs-carrying genes corresponding to transcription factors and disease resistance-related proteins. The utilization of these mined SNPs in Δ (SNP index)-led QTL-seq analysis and their correlation between two mapping populations based on mQTL-seq, narrowed down two (CaqaPN4.1: 867.8 kb and CaqaPN4.2: 1.8 Mb) major genomic regions harbouring robust pod number QTLs into the high-resolution short QTL intervals (CaqbPN4.1: 637.5 kb and CaqbPN4.2: 1.28 Mb) on chickpea chromosome 4. The integration of mQTL-seq-derived one novel robust QTL with QTL region-specific association analysis delineated the regulatory (C/T) and coding (C/A) SNPs-containing one pentatricopeptide repeat (PPR) gene at a major QTL region regulating pod number in chickpea. This target gene exhibited anther, mature pollen and pod-specific expression, including pronounced higher up-regulated (∼3.5-folds) transcript expression in high pod number-containing parental accessions and homozygous individuals of two mapping populations especially during pollen and pod development. The proposed mQTL-seq-driven combinatorial strategy has profound efficacy in rapid genome-wide scanning of potential candidate gene(s) underlying trait-associated high-resolution robust QTL(s), thereby expediting genomics-assisted breeding and genetic enhancement of crop plants, including chickpea. PMID:26685680

  2. Transcriptome Analysis of Orbital Adipose Tissue in Active Thyroid Eye Disease Using Next Generation RNA Sequencing Technology

    PubMed Central

    Lee, Bradford W.; Kumar, Virender B.; Biswas, Pooja; Ko, Audrey C.; Alameddine, Ramzi M.; Granet, David B.; Ayyagari, Radha; Kikkawa, Don O.; Korn, Bobby S.

    2018-01-01

    Objective: This study utilized Next Generation Sequencing (NGS) to identify differentially expressed transcripts in orbital adipose tissue from patients with active Thyroid Eye Disease (TED) versus healthy controls. Method: This prospective, case-control study enrolled three patients with severe, active thyroid eye disease undergoing orbital decompression, and three healthy controls undergoing routine eyelid surgery with removal of orbital fat. RNA Sequencing (RNA-Seq) was performed on freshly obtained orbital adipose tissue from study patients to analyze the transcriptome. Bioinformatics analysis was performed to determine pathways and processes enriched for the differential expression profile. Quantitative Reverse Transcriptase-Polymerase Chain Reaction (qRT-PCR) was performed to validate the differential expression of selected genes identified by RNA-Seq. Results: RNA-Seq identified 328 differentially expressed genes associated with active thyroid eye disease, many of which were responsible for mediating inflammation, cytokine signaling, adipogenesis, IGF-1 signaling, and glycosaminoglycan binding. The IL-5 and chemokine signaling pathways were highly enriched, and very-low-density-lipoprotein receptor activity and statin medications were implicated as having a potential role in TED. Conclusion: This study is the first to use RNA-Seq technology to elucidate differential gene expression associated with active, severe TED. This study suggests a transcriptional basis for the role of statins in modulating differentially expressed genes that mediate the pathogenesis of thyroid eye disease. Furthermore, the identification of genes with altered levels of expression in active, severe TED may inform the molecular pathways central to this clinical phenotype and guide the development of novel therapeutic agents. PMID:29760827

  3. PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

    PubMed

    Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

    2018-05-01

    The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.

  4. Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.

    PubMed

    Evans, Ciaran; Hardin, Johanna; Stoebel, Daniel M

    2017-02-27

    RNA-Seq is a widely used method for studying the behavior of genes under different biological conditions. An essential step in an RNA-Seq study is normalization, in which raw data are adjusted to account for factors that prevent direct comparison of expression measures. Errors in normalization can have a significant impact on downstream analysis, such as inflated false positives in differential expression analysis. An underemphasized feature of normalization is the assumptions on which the methods rely and how the validity of these assumptions can have a substantial impact on the performance of the methods. In this article, we explain how assumptions provide the link between raw RNA-Seq read counts and meaningful measures of gene expression. We examine normalization methods from the perspective of their assumptions, as an understanding of methodological assumptions is necessary for choosing methods appropriate for the data at hand. Furthermore, we discuss why normalization methods perform poorly when their assumptions are violated and how this causes problems in subsequent analysis. To analyze a biological experiment, researchers must select a normalization method with assumptions that are met and that produces a meaningful measure of expression for the given experiment. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    PubMed

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  6. Elucidating the 16S rRNA 3' boundaries and defining optimal SD/aSD pairing in Escherichia coli and Bacillus subtilis using RNA-Seq data.

    PubMed

    Wei, Yulong; Silke, Jordan R; Xia, Xuhua

    2017-12-15

    Bacterial translation initiation is influenced by base pairing between the Shine-Dalgarno (SD) sequence in the 5' UTR of mRNA and the anti-SD (aSD) sequence at the free 3' end of the 16S rRNA (3' TAIL) due to: 1) the SD/aSD sequence binding location and 2) SD/aSD binding affinity. In order to understand what makes an SD/aSD interaction optimal, we must define: 1) terminus of the 3' TAIL and 2) extent of the core aSD sequence within the 3' TAIL. Our approach to characterize these components in Escherichia coli and Bacillus subtilis involves 1) mapping the 3' boundary of the mature 16S rRNA using high-throughput RNA sequencing (RNA-Seq), and 2) identifying the segment within the 3' TAIL that is strongly preferred in SD/aSD pairing. Using RNA-Seq data, we resolve previous discrepancies in the reported 3' TAIL in B. subtilis and recovered the established 3' TAIL in E. coli. Furthermore, we extend previous studies to suggest that both highly and lowly expressed genes favor SD sequences with intermediate binding affinity, but this trend is exclusive to SD sequences that complement the core aSD sequences defined herein.

  7. Model-based clustering for RNA-seq data.

    PubMed

    Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P

    2014-01-15

    RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org

  8. ANISEED 2017: extending the integrated ascidian database to the exploration and evolutionary comparison of genome-scale datasets.

    PubMed

    Brozovic, Matija; Dantec, Christelle; Dardaillon, Justine; Dauga, Delphine; Faure, Emmanuel; Gineste, Mathieu; Louis, Alexandra; Naville, Magali; Nitta, Kazuhiro R; Piette, Jacques; Reeves, Wendy; Scornavacca, Céline; Simion, Paul; Vincentelli, Renaud; Bellec, Maelle; Aicha, Sameh Ben; Fagotto, Marie; Guéroult-Bellone, Marion; Haeussler, Maximilian; Jacox, Edwin; Lowe, Elijah K; Mendez, Mickael; Roberge, Alexis; Stolfi, Alberto; Yokomori, Rui; Brown, C Titus; Cambillau, Christian; Christiaen, Lionel; Delsuc, Frédéric; Douzery, Emmanuel; Dumollard, Rémi; Kusakabe, Takehiro; Nakai, Kenta; Nishida, Hiroki; Satou, Yutaka; Swalla, Billie; Veeman, Michael; Volff, Jean-Nicolas; Lemaire, Patrick

    2018-01-04

    ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Replicates, read numbers, and other important experimental design considerations for microbial RNA-seq identified using Bacillus thuringiensis datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less

  10. Replicates, read numbers, and other important experimental design considerations for microbial RNA-seq identified using Bacillus thuringiensis datasets

    DOE PAGES

    Lu, Tse -Yuan; Mehlhorn, Tonia L; Pelletier, Dale A.; ...

    2016-05-31

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, whichmore » were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). Furthermore, this study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.« less

  11. Replicates, Read Numbers, and Other Important Experimental Design Considerations for Microbial RNA-seq Identified Using Bacillus thuringiensis Datasets.

    PubMed

    Manga, Punita; Klingeman, Dawn M; Lu, Tse-Yuan S; Mehlhorn, Tonia L; Pelletier, Dale A; Hauser, Loren J; Wilson, Charlotte M; Brown, Steven D

    2016-01-01

    RNA-seq is being used increasingly for gene expression studies and it is revolutionizing the fields of genomics and transcriptomics. However, the field of RNA-seq analysis is still evolving. Therefore, we specifically designed this study to contain large numbers of reads and four biological replicates per condition so we could alter these parameters and assess their impact on differential expression results. Bacillus thuringiensis strains ATCC10792 and CT43 were grown in two Luria broth medium lots on four dates and transcriptomics data were generated using one lane of sequence output from an Illumina HiSeq2000 instrument for each of the 32 samples, which were then analyzed using DESeq2. Genome coverages across samples ranged from 87 to 465X with medium lots and culture dates identified as major variation sources. Significantly differentially expressed genes (5% FDR, two-fold change) were detected for cultures grown using different medium lots and between different dates. The highly differentially expressed iron acquisition and metabolism genes, were a likely consequence of differing amounts of iron in the two media lots. Indeed, in this study RNA-seq was a tool for predictive biology since we hypothesized and confirmed the two LB medium lots had different iron contents (~two-fold difference). This study shows that the noise in data can be controlled and minimized with appropriate experimental design and by having the appropriate number of replicates and reads for the system being studied. We outline parameters for an efficient and cost effective microbial transcriptomics study.

  12. RNA-seq based transcriptomic map reveals new insights into mouse salivary gland development and maturation.

    PubMed

    Gluck, Christian; Min, Sangwon; Oyelakin, Akinsola; Smalley, Kirsten; Sinha, Satrajit; Romano, Rose-Anne

    2016-11-16

    Mouse models have served a valuable role in deciphering various facets of Salivary Gland (SG) biology, from normal developmental programs to diseased states. To facilitate such studies, gene expression profiling maps have been generated for various stages of SG organogenesis. However these prior studies fall short of capturing the transcriptional complexity due to the limited scope of gene-centric microarray-based technology. Compared to microarray, RNA-sequencing (RNA-seq) offers unbiased detection of novel transcripts, broader dynamic range and high specificity and sensitivity for detection of genes, transcripts, and differential gene expression. Although RNA-seq data, particularly under the auspices of the ENCODE project, have covered a large number of biological specimens, studies on the SG have been lacking. To better appreciate the wide spectrum of gene expression profiles, we isolated RNA from mouse submandibular salivary glands at different embryonic and adult stages. In parallel, we processed RNA-seq data for 24 organs and tissues obtained from the mouse ENCODE consortium and calculated the average gene expression values. To identify molecular players and pathways likely to be relevant for SG biology, we performed functional gene enrichment analysis, network construction and hierarchal clustering of the RNA-seq datasets obtained from different stages of SG development and maturation, and other mouse organs and tissues. Our bioinformatics-based data analysis not only reaffirmed known modulators of SG morphogenesis but revealed novel transcription factors and signaling pathways unique to mouse SG biology and function. Finally we demonstrated that the unique SG gene signature obtained from our mouse studies is also well conserved and can demarcate features of the human SG transcriptome that is different from other tissues. Our RNA-seq based Atlas has revealed a high-resolution cartographic view of the dynamic transcriptomic landscape of the mouse SG at various stages. These RNA-seq datasets will complement pre-existing microarray based datasets, including the Salivary Gland Molecular Anatomy Project by offering a broader systems-biology based perspective rather than the classical gene-centric view. Ultimately such resources will be valuable in providing a useful toolkit to better understand how the diverse cell population of the SG are organized and controlled during development and differentiation.

  13. A comprehensive simulation study on classification of RNA-Seq data.

    PubMed

    Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

    2017-01-01

    RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.

  14. Genome-wide binding site analysis of FAR-RED ELONGATED HYPOCOTYL3 reveals its novel function in Arabidopsis development.

    PubMed

    Ouyang, Xinhao; Li, Jigang; Li, Gang; Li, Bosheng; Chen, Beibei; Shen, Huaishun; Huang, Xi; Mo, Xiaorong; Wan, Xiangyuan; Lin, Rongcheng; Li, Shigui; Wang, Haiyang; Deng, Xing Wang

    2011-07-01

    FAR-RED ELONGATED HYPOCOTYL3 (FHY3) and its homolog FAR-RED IMPAIRED RESPONSE1 (FAR1), two transposase-derived transcription factors, are key components in phytochrome A signaling and the circadian clock. Here, we use chromatin immunoprecipitation-based sequencing (ChIP-seq) to identify 1559 and 1009 FHY3 direct target genes in darkness (D) and far-red (FR) light conditions, respectively, in the Arabidopsis thaliana genome. FHY3 preferentially binds to promoters through the FHY3/FAR1 binding motif (CACGCGC). Interestingly, FHY3 also binds to two motifs in the 178-bp Arabidopsis centromeric repeats. Comparison between the ChIP-seq and microarray data indicates that FHY3 quickly regulates the expression of 197 and 86 genes in D and FR, respectively. FHY3 also coregulates a number of common target genes with PHYTOCHROME INTERACTING FACTOR 3-LIKE5 and ELONGATED HYPOCOTYL5. Moreover, we uncover a role for FHY3 in controlling chloroplast development by directly activating the expression of ACCUMULATION AND REPLICATION OF CHLOROPLASTS5, whose product is a structural component of the latter stages of chloroplast division in Arabidopsis. Taken together, our data suggest that FHY3 regulates multiple facets of plant development, thus providing insights into its functions beyond light and circadian pathways.

  15. Integrated RNA-Seq and sRNA-Seq Analysis Identifies Chilling and Freezing Responsive Key Molecular Players and Pathways in Tea Plant (Camellia sinensis)

    PubMed Central

    Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang

    2015-01-01

    Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants’ growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., ‘Photosynthesis’), GO terms (e.g., ‘response to karrikin’) and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology. PMID:25901577

  16. Integrated RNA-Seq and sRNA-Seq Analysis Identifies Chilling and Freezing Responsive Key Molecular Players and Pathways in Tea Plant (Camellia sinensis).

    PubMed

    Zheng, Chao; Zhao, Lei; Wang, Yu; Shen, Jiazhi; Zhang, Yinfei; Jia, Sisi; Li, Yusheng; Ding, Zhaotang

    2015-01-01

    Tea [Camellia sinensis (L) O. Kuntze, Theaceae] is one of the most popular non-alcoholic beverages worldwide. Cold stress is one of the most severe abiotic stresses that limit tea plants' growth, survival and geographical distribution. However, the genetic regulatory network and signaling pathways involved in cold stress responses in tea plants remain unearthed. Using RNA-Seq, DGE and sRNA-Seq technologies, we performed an integrative analysis of miRNA and mRNA expression profiling and their regulatory network of tea plants under chilling (4℃) and freezing (-5℃) stress. Differentially expressed (DE) miRNA and mRNA profiles were obtained based on fold change analysis, miRNAs and target mRNAs were found to show both coherent and incoherent relationships in the regulatory network. Furthermore, we compared several key pathways (e.g., 'Photosynthesis'), GO terms (e.g., 'response to karrikin') and transcriptional factors (TFs, e.g., DREB1b/CBF1) which were identified as involved in the early chilling and/or freezing response of tea plants. Intriguingly, we found that karrikins, a new group of plant growth regulators, and β-primeverosidase (BPR), a key enzyme functionally relevant with the formation of tea aroma might play an important role in both early chilling and freezing response of tea plants. Quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) analysis further confirmed the results from RNA-Seq and sRNA-Seq analysis. This is the first study to simultaneously profile the expression patterns of both miRNAs and mRNAs on a genome-wide scale to elucidate the molecular mechanisms of early responses of tea plants to cold stress. In addition to gaining a deeper insight into the cold resistant characteristics of tea plants, we provide a good case study to analyse mRNA/miRNA expression and profiling of non-model plant species using next-generation sequencing technology.

  17. Dysregulated microRNA Activity in Shwachman-Diamond Syndrome

    DTIC Science & Technology

    2016-09-01

    define transcriptional signatures of bone marrow failure in SDS using single cell RNA -seq of patient cells. We will analyze these datasets to test the...microRNA expression profiles from HSPCs to be overlaid onto mRNA profiles. 15. SUBJECT TERMS Single cell RNA -seq; bone marrow failure; hematopoiesis...myelopoiesis; targeted RNA -seq 16. SECURITY CLASSIFICATION OF: U 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME OF RESPONSIBLE PERSON

  18. 5' Rapid Amplification of cDNA Ends and Illumina MiSeq Reveals B Cell Receptor Features in Healthy Adults, Adults With Chronic HIV-1 Infection, Cord Blood, and Humanized Mice.

    PubMed

    Waltari, Eric; Jia, Manxue; Jiang, Caroline S; Lu, Hong; Huang, Jing; Fernandez, Cristina; Finzi, Andrés; Kaufmann, Daniel E; Markowitz, Martin; Tsuji, Moriya; Wu, Xueling

    2018-01-01

    Using 5' rapid amplification of cDNA ends, Illumina MiSeq, and basic flow cytometry, we systematically analyzed the expressed B cell receptor (BCR) repertoire in 14 healthy adult PBMCs, 5 HIV-1+ adult PBMCs, 5 cord blood samples, and 3 HIS-CD4/B mice, examining the full-length variable region of μ, γ, α, κ, and λ chains for V-gene usage, somatic hypermutation (SHM), and CDR3 length. Adding to the known repertoire of healthy adults, Illumina MiSeq consistently detected small fractions of reads with high mutation frequencies including hypermutated μ reads, and reads with long CDR3s. Additionally, the less studied IgA repertoire displayed similar characteristics to that of IgG. Compared to healthy adults, the five HIV-1 chronically infected adults displayed elevated mutation frequencies for all μ, γ, α, κ, and λ chains examined and slightly longer CDR3 lengths for γ, α, and λ. To evaluate the reconstituted human BCR sequences in a humanized mouse model, we analyzed cord blood and HIS-CD4/B mice, which all lacked the typical SHM seen in the adult reference. Furthermore, MiSeq revealed identical unmutated IgM sequences derived from separate cell aliquots, thus for the first time demonstrating rare clonal members of unmutated IgM B cells by sequencing.

  19. Digital gene expression approach over multiple RNA-Seq data sets to detect neoblast transcriptional changes in Schmidtea mediterranea.

    PubMed

    Rodríguez-Esteban, Gustavo; González-Sastre, Alejandro; Rojo-Laguna, José Ignacio; Saló, Emili; Abril, Josep F

    2015-05-08

    The freshwater planarian Schmidtea mediterranea is recognised as a valuable model for research into adult stem cells and regeneration. With the advent of the high-throughput sequencing technologies, it has become feasible to undertake detailed transcriptional analysis of its unique stem cell population, the neoblasts. Nonetheless, a reliable reference for this type of studies is still lacking. Taking advantage of digital gene expression (DGE) sequencing technology we compare all the available transcriptomes for S. mediterranea and improve their annotation. These results are accessible via web for the community of researchers. Using the quantitative nature of DGE, we describe the transcriptional profile of neoblasts and present 42 new neoblast genes, including several cancer-related genes and transcription factors. Furthermore, we describe in detail the Smed-meis-like gene and the three Nuclear Factor Y subunits Smed-nf-YA, Smed-nf-YB-2 and Smed-nf-YC. DGE is a valuable tool for gene discovery, quantification and annotation. The application of DGE in S. mediterranea confirms the planarian stem cells or neoblasts as a complex population of pluripotent and multipotent cells regulated by a mixture of transcription factors and cancer-related genes.

  20. Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.

    PubMed

    Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi

    2013-10-07

    The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome sequence, which will advance our understanding of the dynamics and complexity of the B. rapa transcriptome. The atlas of gene expression in different tissues will be useful for accelerating research on functional genomics and genome evolution in Brassica species.

  1. Comparative Genomics as a Foundation for Evo-Devo Studies in Birds.

    PubMed

    Grayson, Phil; Sin, Simon Y W; Sackton, Timothy B; Edwards, Scott V

    2017-01-01

    Developmental genomics is a rapidly growing field, and high-quality genomes are a useful foundation for comparative developmental studies. A high-quality genome forms an essential reference onto which the data from numerous assays and experiments, including ChIP-seq, ATAC-seq, and RNA-seq, can be mapped. A genome also streamlines and simplifies the development of primers used to amplify putative regulatory regions for enhancer screens, cDNA probes for in situ hybridization, microRNAs (miRNAs) or short hairpin RNAs (shRNA) for RNA interference (RNAi) knockdowns, mRNAs for misexpression studies, and even guide RNAs (gRNAs) for CRISPR knockouts. Finally, much can be gleaned from comparative genomics alone, including the identification of highly conserved putative regulatory regions. This chapter provides an overview of laboratory and bioinformatics protocols for DNA extraction, library preparation, library quantification, and genome assembly, from fresh or frozen tissue to a draft avian genome. Generating a high-quality draft genome can provide a developmental research group with excellent resources for their study organism, opening the doors to many additional assays and experiments.

  2. MAJIQ-SPEL: Web-tool to interrogate classical and complex splicing variations from RNA-Seq data.

    PubMed

    Green, Christopher J; Gazzara, Matthew R; Barash, Yoseph

    2017-09-11

    Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret, and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Program and code will be available at http://majiq.biociphers.org/majiq-spel. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  3. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

    PubMed Central

    Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676

  4. Mining the archives: a cross-platform analysis of gene ...

    EPA Pesticide Factsheets

    Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for genomic research into the molecular basis of disease. However, use of FFPE samples in gene expression studies has been limited by technical challenges resulting from degradation of nucleic acids. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues using two DNA microarray protocols and two whole transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other three methods by having the highest correlations of differentially expressed genes (DEGs) and best overlap of pathways between FRO and FFPE groups. We next tested the effect of sample time in formalin (18 hours or 3 weeks) on gene expression profiles. Hierarchical clustering of the datasets indicated that test article treatment, and not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18-hour and 3-week FFPE samples compared to FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of age in FFPE block on genomic profiles. RNA-seq analysis of 8-, 19-, and 26-year-old control blocks using the ribo-depletion protocol resulted in comparable quality metrics, inc

  5. Global gene expression analysis using RNA-seq uncovered a new role for SR1/CAMTA3 transcription factor in salt stress

    PubMed Central

    Prasad, Kasavajhala V. S. K.; Abdel-Hameed, Amira A. E.; Xing, Denghui; Reddy, Anireddy S. N.

    2016-01-01

    Abiotic and biotic stresses cause significant yield losses in all crops. Acquisition of stress tolerance in plants requires rapid reprogramming of gene expression. SR1/CAMTA3, a member of signal responsive transcription factors (TFs), functions both as a positive and a negative regulator of biotic stress responses and as a positive regulator of cold stress-induced gene expression. Using high throughput RNA-seq, we identified ~3000 SR1-regulated genes. Promoters of about 60% of the differentially expressed genes have a known DNA binding site for SR1, suggesting that they are likely direct targets. Gene ontology analysis of SR1-regulated genes confirmed previously known functions of SR1 and uncovered a potential role for this TF in salt stress. Our results showed that SR1 mutant is more tolerant to salt stress than the wild type and complemented line. Improved tolerance of sr1 seedlings to salt is accompanied with the induction of salt-responsive genes. Furthermore, ChIP-PCR results showed that SR1 binds to promoters of several salt-responsive genes. These results suggest that SR1 acts as a negative regulator of salt tolerance by directly repressing the expression of salt-responsive genes. Overall, this study identified SR1-regulated genes globally and uncovered a previously uncharacterized role for SR1 in salt stress response. PMID:27251464

  6. Genome characterization of the selected long- and short-sleep mouse lines.

    PubMed

    Dowell, Robin; Odell, Aaron; Richmond, Phillip; Malmer, Daniel; Halper-Stromberg, Eitan; Bennett, Beth; Larson, Colin; Leach, Sonia; Radcliffe, Richard A

    2016-12-01

    The Inbred Long- and Short-Sleep (ILS, ISS) mouse lines were selected for differences in acute ethanol sensitivity using the loss of righting response (LORR) as the selection trait. The lines show an over tenfold difference in LORR and, along with a recombinant inbred panel derived from them (the LXS), have been widely used to dissect the genetic underpinnings of acute ethanol sensitivity. Here we have sequenced the genomes of the ILS and ISS to investigate the DNA variants that contribute to their sensitivity difference. We identified ~2.7 million high-confidence SNPs and small indels and ~7000 structural variants between the lines; variants were found to occur in 6382 annotated genes. Using a hidden Markov model, we were able to reconstruct the genome-wide ancestry patterns of the eight inbred progenitor strains from which the ILS and ISS were derived, and found that quantitative trait loci that have been mapped for LORR were slightly enriched for DNA variants. Finally, by mapping and quantifying RNA-seq reads from the ILS and ISS to their strain-specific genomes rather than to the reference genome, we found a substantial improvement in a differential expression analysis between the lines. This work will help in identifying and characterizing the DNA sequence variants that contribute to the difference in ethanol sensitivity between the ILS and ISS and will also aid in accurate quantification of RNA-seq data generated from the LXS RIs.

  7. How to design a single-cell RNA-sequencing experiment: pitfalls, challenges and perspectives.

    PubMed

    Dal Molin, Alessandra; Di Camillo, Barbara

    2018-01-31

    The sequencing of the transcriptome of single cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types in heterogeneous cell populations or for the study of stochastic gene expression. In recent years, various experimental methods and computational tools for analysing single-cell RNA-sequencing data have been proposed. However, most of them are tailored to different experimental designs or biological questions, and in many cases, their performance has not been benchmarked yet, thus increasing the difficulty for a researcher to choose the optimal single-cell transcriptome sequencing (scRNA-seq) experiment and analysis workflow. In this review, we aim to provide an overview of the current available experimental and computational methods developed to handle single-cell RNA-sequencing data and, based on their peculiarities, we suggest possible analysis frameworks depending on specific experimental designs. Together, we propose an evaluation of challenges and open questions and future perspectives in the field. In particular, we go through the different steps of scRNA-seq experimental protocols such as cell isolation, messenger RNA capture, reverse transcription, amplification and use of quantitative standards such as spike-ins and Unique Molecular Identifiers (UMIs). We then analyse the current methodological challenges related to preprocessing, alignment, quantification, normalization, batch effect correction and methods to control for confounding effects. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Digital gene expression analysis with sample multiplexing and PCR duplicate detection: A straightforward protocol.

    PubMed

    Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph

    2016-01-01

    Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.

  9. An atlas of gene expression and gene co-regulation in the human retina.

    PubMed

    Pinelli, Michele; Carissimo, Annamaria; Cutillo, Luisa; Lai, Ching-Hung; Mutarelli, Margherita; Moretti, Maria Nicoletta; Singh, Marwah Veer; Karali, Marianthi; Carrella, Diego; Pizzo, Mariateresa; Russo, Francesco; Ferrari, Stefano; Ponzin, Diego; Angelini, Claudia; Banfi, Sandro; di Bernardo, Diego

    2016-07-08

    The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Clinical implications of genomic profiles in metastatic breast cancer with a focus on TP53 and PIK3CA, the most frequently mutated genes.

    PubMed

    Kim, Ji-Yeon; Lee, Eunjin; Park, Kyunghee; Park, Woong-Yang; Jung, Hae Hyun; Ahn, Jin Seok; Im, Young-Hyuck; Park, Yeon Hee

    2017-04-25

    Breast cancer (BC) has been genetically profiled through large-scale genome analyses. However, the role and clinical implications of genetic alterations in metastatic BC (MBC) have not been evaluated. Therefore, we conducted whole-exome sequencing (WES) and RNA-Seq of 37 MBC samples and targeted deep sequencing of another 29 MBCs. We evaluated somatic mutations from WES and targeted sequencing and assessed gene expression and performed pathway analysis from RNA-Seq. In this analysis, PIK3CA was the most commonly mutated gene in estrogen receptor (ER)-positive BC, while in ER-negative BC, TP53 was the most commonly mutated gene (p = 0.018 and p < 0.001, respectively). TP53 stopgain/loss and frameshift mutation was related to low expression of TP53 in contrast nonsynonymous mutation was related to high expression. The impact of TP53 mutation on clinical outcome varied with regard to ER status. In ER-positive BCs, wild type TP53 had a better prognosis than mutated TP53 (median overall survival (OS) (wild type vs. mutated): 88.5 ± 54.4 vs. 32.6 ± 10.7 (months), p = 0.002). In contrast, mutated TP53 had a protective effect in ER-negative BCs (median OS: 0.10 vs. 32.6 ± 8.2, p = 0.026). However, PIK3CA mutation did not affect patient survival. In gene expression analysis, CALM1, a potential regulator of AKT, was highly expressed in PIK3CA-mutated BCs. In conclusion, mutation of TP53 was associated with expression status and affect clinical outcome according to ER status in MBC. Although mutation of PIK3CA was not related to survival in this study, mutation of PIK3CA altered the expression of other genes and pathways including CALM1 and may be a potential predictive marker of PI3K inhibitor effectiveness.

  11. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq

    PubMed Central

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species. PMID:26177103

  12. De Novo Characterization of the Mung Bean Transcriptome and Transcriptomic Analysis of Adventitious Rooting in Seedlings Using RNA-Seq.

    PubMed

    Li, Shi-Weng; Shi, Rui-Fang; Leng, Yan

    2015-01-01

    Adventitious rooting is the most important mechanism underlying vegetative propagation and an important strategy for plant propagation under environmental stress. The present study was conducted to obtain transcriptomic data and examine gene expression using RNA-Seq and bioinformatics analysis, thereby providing a foundation for understanding the molecular mechanisms controlling adventitious rooting. Three cDNA libraries constructed from mRNA samples from mung bean hypocotyls during adventitious rooting were sequenced. These three samples generated a total of 73 million, 60 million, and 59 million 100-bp reads, respectively. These reads were assembled into 78,697 unigenes with an average length of 832 bp, totaling 65 Mb. The unigenes were aligned against six public protein databases, and 29,029 unigenes (36.77%) were annotated using BLASTx. Among them, 28,225 (35.75%) and 28,119 (35.62%) unigenes had homologs in the TrEMBL and NCBI non-redundant (Nr) databases, respectively. Of these unigenes, 21,140 were assigned to gene ontology classes, and a total of 11,990 unigenes were classified into 25 KOG functional categories. A total of 7,357 unigenes were annotated to 4,524 KOs, and 4,651 unigenes were mapped onto 342 KEGG pathways using BLAST comparison against the KEGG database. A total of 11,717 unigenes were differentially expressed (fold change>2) during the root induction stage, with 8,772 unigenes down-regulated and 2,945 unigenes up-regulated. A total of 12,737 unigenes were differentially expressed during the root initiation stage, with 9,303 unigenes down-regulated and 3,434 unigenes up-regulated. A total of 5,334 unigenes were differentially expressed between the root induction and initiation stage, with 2,167 unigenes down-regulated and 3,167 unigenes up-regulated. qRT-PCR validation of the 39 genes with known functions indicated a strong correlation (92.3%) with the RNA-Seq data. The GO enrichment, pathway mapping, and gene expression profiles reveal molecular traits for root induction and initiation. This study provides a platform for functional genomic research with this species.

  13. An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study.

    PubMed

    Wang, Zichen; Ma'ayan, Avi

    2016-01-01

    RNA-seq analysis is becoming a standard method for global gene expression profiling. However, open and standard pipelines to perform RNA-seq analysis by non-experts remain challenging due to the large size of the raw data files and the hardware requirements for running the alignment step. Here we introduce a reproducible open source RNA-seq pipeline delivered as an IPython notebook and a Docker image. The pipeline uses state-of-the-art tools and can run on various platforms with minimal configuration overhead. The pipeline enables the extraction of knowledge from typical RNA-seq studies by generating interactive principal component analysis (PCA) and hierarchical clustering (HC) plots, performing enrichment analyses against over 90 gene set libraries, and obtaining lists of small molecules that are predicted to either mimic or reverse the observed changes in mRNA expression. We apply the pipeline to a recently published RNA-seq dataset collected from human neuronal progenitors infected with the Zika virus (ZIKV). In addition to confirming the presence of cell cycle genes among the genes that are downregulated by ZIKV, our analysis uncovers significant overlap with upregulated genes that when knocked out in mice induce defects in brain morphology. This result potentially points to the molecular processes associated with the microcephaly phenotype observed in newborns from pregnant mothers infected with the virus. In addition, our analysis predicts small molecules that can either mimic or reverse the expression changes induced by ZIKV. The IPython notebook and Docker image are freely available at:  http://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb and  https://hub.docker.com/r/maayanlab/zika/.

  14. Human primitive brain displays negative mitochondrial-nuclear expression correlation of respiratory genes.

    PubMed

    Barshad, Gilad; Blumberg, Amit; Cohen, Tal; Mishmar, Dan

    2018-06-14

    Oxidative phosphorylation (OXPHOS), a fundamental energy source in all human tissues, requires interactions between mitochondrial (mtDNA)- and nuclear (nDNA)-encoded protein subunits. Although such interactions are fundamental to OXPHOS, bi-genomic coregulation is poorly understood. To address this question, we analyzed ∼8500 RNA-seq experiments from 48 human body sites. Despite well-known variation in mitochondrial activity, quantity, and morphology, we found overall positive mtDNA-nDNA OXPHOS genes' co-expression across human tissues. Nevertheless, negative mtDNA-nDNA gene expression correlation was identified in the hypothalamus, basal ganglia, and amygdala (subcortical brain regions, collectively termed the "primitive" brain). Single-cell RNA-seq analysis of mouse and human brains revealed that this phenomenon is evolutionarily conserved, and both are influenced by brain cell types (involving excitatory/inhibitory neurons and nonneuronal cells) and by their spatial brain location. As the "primitive" brain is highly oxidative, we hypothesized that such negative mtDNA-nDNA co-expression likely controls for the high mtDNA transcript levels, which enforce tight OXPHOS regulation, rather than rewiring toward glycolysis. Accordingly, we found "primitive" brain-specific up-regulation of lactate dehydrogenase B ( LDHB ), which associates with high OXPHOS activity, at the expense of LDHA , which promotes glycolysis. Analyses of co-expression, DNase-seq, and ChIP-seq experiments revealed candidate RNA-binding proteins and CEBPB as the best regulatory candidates to explain these phenomena. Finally, cross-tissue expression analysis unearthed tissue-dependent splice variants and OXPHOS subunit paralogs and allowed revising the list of canonical OXPHOS transcripts. Taken together, our analysis provides a comprehensive view of mito-nuclear gene co-expression across human tissues and provides overall insights into the bi-genomic regulation of mitochondrial activities. © 2018 Barshad et al.; Published by Cold Spring Harbor Laboratory Press.

  15. RNA-seq reveals more consistent reference genes for gene expression studies in human non-melanoma skin cancers

    PubMed Central

    Tan, Jean-Marie; Payne, Elizabeth J.; Lin, Lynlee L.; Sinnya, Sudipta; Raphael, Anthony P.; Lambie, Duncan; Frazer, Ian H.; Dinger, Marcel E.; Soyer, H. Peter

    2017-01-01

    Identification of appropriate reference genes (RGs) is critical to accurate data interpretation in quantitative real-time PCR (qPCR) experiments. In this study, we have utilised next generation RNA sequencing (RNA-seq) to analyse the transcriptome of a panel of non-melanoma skin cancer lesions, identifying genes that are consistently expressed across all samples. Genes encoding ribosomal proteins were amongst the most stable in this dataset. Validation of this RNA-seq data was examined using qPCR to confirm the suitability of a set of highly stable genes for use as qPCR RGs. These genes will provide a valuable resource for the normalisation of qPCR data for the analysis of non-melanoma skin cancer. PMID:28852586

  16. The power and promise of RNA-seq in ecology and evolution.

    PubMed

    Todd, Erica V; Black, Michael A; Gemmell, Neil J

    2016-03-01

    Reference is regularly made to the power of new genomic sequencing approaches. Using powerful technology, however, is not the same as having the necessary power to address a research question with statistical robustness. In the rush to adopt new and improved genomic research methods, limitations of technology and experimental design may be initially neglected. Here, we review these issues with regard to RNA sequencing (RNA-seq). RNA-seq adds large-scale transcriptomics to the toolkit of ecological and evolutionary biologists, enabling differential gene expression (DE) studies in nonmodel species without the need for prior genomic resources. High biological variance is typical of field-based gene expression studies and means that larger sample sizes are often needed to achieve the same degree of statistical power as clinical studies based on data from cell lines or inbred animal models. Sequencing costs have plummeted, yet RNA-seq studies still underutilize biological replication. Finite research budgets force a trade-off between sequencing effort and replication in RNA-seq experimental design. However, clear guidelines for negotiating this trade-off, while taking into account study-specific factors affecting power, are currently lacking. Study designs that prioritize sequencing depth over replication fail to capitalize on the power of RNA-seq technology for DE inference. Significant recent research effort has gone into developing statistical frameworks and software tools for power analysis and sample size calculation in the context of RNA-seq DE analysis. We synthesize progress in this area and derive an accessible rule-of-thumb guide for designing powerful RNA-seq experiments relevant in eco-evolutionary and clinical settings alike. © 2016 John Wiley & Sons Ltd.

  17. An empirical strategy to detect bacterial transcript structure from directional RNA-seq transcriptome data.

    PubMed

    Wang, Yejun; MacKenzie, Keith D; White, Aaron P

    2015-05-07

    As sequencing costs are being lowered continuously, RNA-seq has gradually been adopted as the first choice for comparative transcriptome studies with bacteria. Unlike microarrays, RNA-seq can directly detect cDNA derived from mRNA transcripts at a single nucleotide resolution. Not only does this allow researchers to determine the absolute expression level of genes, but it also conveys information about transcript structure. Few automatic software tools have yet been established to investigate large-scale RNA-seq data for bacterial transcript structure analysis. In this study, 54 directional RNA-seq libraries from Salmonella serovar Typhimurium (S. Typhimurium) 14028s were examined for potential relationships between read mapping patterns and transcript structure. We developed an empirical method, combined with statistical tests, to automatically detect key transcript features, including transcriptional start sites (TSSs), transcriptional termination sites (TTSs) and operon organization. Using our method, we obtained 2,764 TSSs and 1,467 TTSs for 1331 and 844 different genes, respectively. Identification of TSSs facilitated further discrimination of 215 putative sigma 38 regulons and 863 potential sigma 70 regulons. Combining the TSSs and TTSs with intergenic distance and co-expression information, we comprehensively annotated the operon organization in S. Typhimurium 14028s. Our results show that directional RNA-seq can be used to detect transcriptional borders at an acceptable resolution of ±10-20 nucleotides. Technical limitations of the RNA-seq procedure may prevent single nucleotide resolution. The automatic transcript border detection methods, statistical models and operon organization pipeline that we have described could be widely applied to RNA-seq studies in other bacteria. Furthermore, the TSSs, TTSs, operons, promoters and unstranslated regions that we have defined for S. Typhimurium 14028s may constitute valuable resources that can be used for comparative analyses with other Salmonella serotypes.

  18. An interdomain network: the endobacterium of a mycorrhizal fungus promotes antioxidative responses in both fungal and plant hosts.

    PubMed

    Vannini, Candida; Carpentieri, Andrea; Salvioli, Alessandra; Novero, Mara; Marsoni, Milena; Testa, Lorenzo; de Pinto, Maria Concetta; Amoresano, Angela; Ortolani, Francesca; Bracale, Marcella; Bonfante, Paola

    2016-07-01

    Arbuscular mycorrhizal fungi (AMF) are obligate plant biotrophs that may contain endobacteria in their cytoplasm. Genome sequencing of Candidatus Glomeribacter gigasporarum revealed a reduced genome and dependence on the fungal host. RNA-seq analysis of the AMF Gigaspora margarita in the presence and absence of the endobacterium indicated that endobacteria have an important role in the fungal pre-symbiotic phase by enhancing fungal bioenergetic capacity. To improve the understanding of fungal-endobacterial interactions, iTRAQ (isobaric tags for relative and absolute quantification) quantitative proteomics was used to identify differentially expressed proteins in G. margarita germinating spores with endobacteria (B+), without endobacteria in the cured line (B-) and after application of the synthetic strigolactone GR24. Proteomic, transcriptomic and biochemical data identified several fungal and bacterial proteins involved in interspecies interactions. Endobacteria influenced fungal growth, calcium signalling and metabolism. The greatest effects were on fungal primary metabolism and respiration, which was 50% higher in B+ than in B-. A shift towards pentose phosphate metabolism was detected in B-. Quantification of carbonylated proteins indicated that the B- line had higher oxidative stress levels, which were also observed in two host plants. This study shows that endobacteria generate a complex interdomain network that affects AMF and fungal-plant interactions. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  19. Detection and Analysis of Circular RNAs by RT-PCR.

    PubMed

    Panda, Amaresh C; Gorospe, Myriam

    2018-03-20

    Gene expression in eukaryotic cells is tightly regulated at the transcriptional and posttranscriptional levels. Posttranscriptional processes, including pre-mRNA splicing, mRNA export, mRNA turnover, and mRNA translation, are controlled by RNA-binding proteins (RBPs) and noncoding (nc)RNAs. The vast family of ncRNAs comprises diverse regulatory RNAs, such as microRNAs and long noncoding (lnc)RNAs, but also the poorly explored class of circular (circ)RNAs. Although first discovered more than three decades ago by electron microscopy, only the advent of high-throughput RNA-sequencing (RNA-seq) and the development of innovative bioinformatic pipelines have begun to allow the systematic identification of circRNAs (Szabo and Salzman, 2016; Panda et al ., 2017b; Panda et al ., 2017c). However, the validation of true circRNAs identified by RNA sequencing requires other molecular biology techniques including reverse transcription (RT) followed by conventional or quantitative (q) polymerase chain reaction (PCR), and Northern blot analysis (Jeck and Sharpless, 2014). RT-qPCR analysis of circular RNAs using divergent primers has been widely used for the detection, validation, and sometimes quantification of circRNAs (Abdelmohsen et al ., 2015 and 2017; Panda et al ., 2017b). As detailed here, divergent primers designed to span the circRNA backsplice junction sequence can specifically amplify the circRNAs and not the counterpart linear RNA. In sum, RT-PCR analysis using divergent primers allows direct detection and quantification of circRNAs.

  20. Introduction to Single-Cell RNA Sequencing.

    PubMed

    Olsen, Thale Kristin; Baryawno, Ninib

    2018-04-01

    During the last decade, high-throughput sequencing methods have revolutionized the entire field of biology. The opportunity to study entire transcriptomes in great detail using RNA sequencing (RNA-seq) has fueled many important discoveries and is now a routine method in biomedical research. However, RNA-seq is typically performed in "bulk," and the data represent an average of gene expression patterns across thousands to millions of cells; this might obscure biologically relevant differences between cells. Single-cell RNA-seq (scRNA-seq) represents an approach to overcome this problem. By isolating single cells, capturing their transcripts, and generating sequencing libraries in which the transcripts are mapped to individual cells, scRNA-seq allows assessment of fundamental biological properties of cell populations and biological systems at unprecedented resolution. Here, we present the most common scRNA-seq protocols in use today and the basics of data analysis and discuss factors that are important to consider before planning and designing an scRNA-seq project. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  1. MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.

    PubMed

    Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten

    2015-01-01

    The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.

  2. Liver Transcriptome and miRNA Analysis of Silver Carp (Hypophthalmichthys molitrix) Intraperitoneally Injected With Microcystin-LR

    PubMed Central

    Qu, Xiancheng; Hu, Menghong; Shang, Yueyong; Pan, Lisha; Jia, Peixuan; Fu, Chunxue; Liu, Qigen; Wang, Youji

    2018-01-01

    Next-generation sequencing was used to analyze the effects of toxic microcystin-LR (MC-LR) on silver carp (Hypophthalmichthys molitrix). Silver carps were intraperitoneally injected with MC-LR, and RNA-seq and miRNA-seq in the liver were analyzed at 0.25, 0.5, and 1 h. The expression of glutathione S-transferase (GST), which acts as a marker gene for MC-LR, was tested to determine the earliest time point at which GST transcription was initiated in the liver tissues of the MC-LR-treated silver carps. Hepatic RNA-seq/miRNA-seq analysis and data integration analysis were conducted with reference to the identified time point. Quantitative PCR (qPCR) was performed to detect the expression of the following genes at the three time points: heme oxygenase 1 (HO-1), interleukin-10 receptor 1 (IL-10R1), apolipoprotein A-I (apoA-I), and heme binding protein 2 (HBP2). Results showed that the liver GST expression was remarkably decreased at 0.25 h (P < 0.05). RNA-seq at this time point revealed that the liver tissue contained 97,505 unigenes, including 184 significantly different unigenes and 75 unknown genes. Gene Ontology (GO) term enrichment analysis suggested that 35 of the 145 enriched GO terms were significantly enriched and mainly related to the immune system regulation network. KEGG pathway enrichment analysis showed that 18 of the 189 pathways were significantly enriched, and the most significant was a ribosome pathway containing 77 differentially expressed genes. miRNA-seq analysis indicated that the longest miRNA had 22 nucleotides (nt), followed by 21 and 23 nt. A total of 286 known miRNAs, 332 known miRNA precursor sequences, and 438 new miRNAs were predicted. A total of 1,048,575 mRNA–miRNA interaction sites were obtained, and 21,252 and 21,241 target genes were respectively predicted in known and new miRNAs. qPCR revealed that HO-1, IL-10R1, apoA-I, and HBP2 were significantly differentially expressed and might play important roles in the toxicity and liver detoxification of MC-LR in fish. These results were consistent with those of high-throughput sequencing, thereby verifying the accuracy of our sequencing data. RNA-seq and miRNA-seq analyses of silver carp liver injected with MC-LR provided valuable and new insights into the toxic effects of MC-LR and the antitoxic mechanisms of MC-LR in fish. The RNA/miRNA data are available from the NCBI database Registration No. : SRP075165. PMID:29692738

  3. Examination of Csr regulatory circuitry using epistasis analysis with RNA-seq (Epi-seq) confirms that CsrD affects gene expression via CsrA, CsrB and CsrC.

    PubMed

    Potts, Anastasia H; Leng, Yuanyuan; Babitzke, Paul; Romeo, Tony

    2018-03-29

    The Csr global regulatory system coordinates gene expression in response to metabolic status. This system utilizes the RNA binding protein CsrA to regulate gene expression by binding to transcripts of structural and regulatory genes, thus affecting their structure, stability, translation, and/or transcription elongation. CsrA activity is controlled by sRNAs, CsrB and CsrC, which sequester CsrA away from other transcripts. CsrB/C levels are partly determined by their rates of turnover, which requires CsrD to render them susceptible to RNase E cleavage. Previous epistasis analysis suggested that CsrD affects gene expression through the other Csr components, CsrB/C and CsrA. However, those conclusions were based on a limited analysis of reporters. Here, we reassessed the global behavior of the Csr circuitry using epistasis analysis with RNA seq (Epi-seq). Because CsrD effects on mRNA levels were entirely lost in the csrA mutant and largely eliminated in a csrB/C mutant under our experimental conditions, while the majority of CsrA effects persisted in the absence of csrD, the original model accounts for the global behavior of the Csr system. Our present results also reflect a more nuanced role of CsrA as terminal regulator of the Csr system than has been recognized.

  4. Transcriptome Analysis of Flounder (Paralichthys olivaceus) Gill in Response to Lymphocystis Disease Virus (LCDV) Infection: Novel Insights into Fish Defense Mechanisms

    PubMed Central

    Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin

    2018-01-01

    Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder (Paralichthys olivaceus) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin, toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV. PMID:29304016

  5. Transcriptome Analysis of Flounder (Paralichthys olivaceus) Gill in Response to Lymphocystis Disease Virus (LCDV) Infection: Novel Insights into Fish Defense Mechanisms.

    PubMed

    Wu, Ronghua; Sheng, Xiuzhen; Tang, Xiaoqian; Xing, Jing; Zhan, Wenbin

    2018-01-05

    Lymphocystis disease virus (LCDV) infection may induce a variety of host gene expression changes associated with disease development; however, our understanding of the molecular mechanisms underlying host-virus interactions is limited. In this study, RNA sequencing (RNA-seq) was employed to investigate differentially expressed genes (DEGs) in the gill of the flounder ( Paralichthys olivaceus ) at one week post LCDV infection. Transcriptome sequencing of the gill with and without LCDV infection was performed using the Illumina HiSeq 2500 platform. In total, RNA-seq analysis generated 193,225,170 clean reads aligned with 106,293 unigenes. Among them, 1812 genes were up-regulated and 1626 genes were down-regulated after LCDV infection. The DEGs related to cellular process and metabolism occupied the dominant position involved in the LCDV infection. A further function analysis demonstrated that the genes related to inflammation, the ubiquitin-proteasome pathway, cell proliferation, apoptosis, tumor formation, and anti-viral defense showed a differential expression. Several DEGs including β actin , toll-like receptors, cytokine-related genes, antiviral related genes, and apoptosis related genes were involved in LCDV entry and immune response. In addition, RNA-seq data was validated by quantitative real-time PCR. For the first time, the comprehensive gene expression study provided valuable insights into the host-pathogen interaction between flounder and LCDV.

  6. DrImpute: imputing dropout events in single cell RNA sequencing data.

    PubMed

    Gong, Wuming; Kwak, Il-Youp; Pota, Pruthvi; Koyano-Nakagawa, Naoko; Garry, Daniel J

    2018-06-08

    The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets. DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .

  7. Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

    PubMed

    Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

    2018-04-24

    mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

  8. Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples.

    PubMed

    Ahdesmäki, Miika J; Gray, Simon R; Johnson, Justin H; Lai, Zhongwu

    2016-01-01

    Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. Disambiguate operates on DNA or RNA-seq alignments to the two species and separates the components at very high sensitivity and specificity as illustrated in artificially mixed human-mouse samples. This allows for maximum recovery of data from target tumours for more accurate variant calling and gene expression quantification. Given that no general use open source algorithm accessible to the bioinformatics community exists for the purposes of separating the two species data, the proposed Disambiguate tool presents a novel approach and improvement to performing sequence analysis of grafted samples. Both Python and C++ implementations are available and they are integrated into several open and closed source pipelines. Disambiguate is open source and is freely available at https://github.com/AstraZeneca-NGS/disambiguate.

  9. Whole transcriptome profiling of taste bud cells.

    PubMed

    Sukumaran, Sunil K; Lewandowski, Brian C; Qin, Yumei; Kotha, Ramana; Bachmanov, Alexander A; Margolskee, Robert F

    2017-08-08

    Analysis of single-cell RNA-Seq data can provide insights into the specific functions of individual cell types that compose complex tissues. Here, we examined gene expression in two distinct subpopulations of mouse taste cells: Tas1r3-expressing type II cells and physiologically identified type III cells. Our RNA-Seq libraries met high quality control standards and accurately captured differential expression of marker genes for type II (e.g. the Tas1r genes, Plcb2, Trpm5) and type III (e.g. Pkd2l1, Ncam, Snap25) taste cells. Bioinformatics analysis showed that genes regulating responses to stimuli were up-regulated in type II cells, while pathways related to neuronal function were up-regulated in type III cells. We also identified highly expressed genes and pathways associated with chemotaxis and axon guidance, providing new insights into the mechanisms underlying integration of new taste cells into the taste bud. We validated our results by immunohistochemically confirming expression of selected genes encoding synaptic (Cplx2 and Pclo) and semaphorin signalling pathway (Crmp2, PlexinB1, Fes and Sema4a) components. The approach described here could provide a comprehensive map of gene expression for all taste cell subpopulations and will be particularly relevant for cell types in taste buds and other tissues that can be identified only by physiological methods.

  10. UVB-induced gene expression in the skin of Xiphophorus maculatus Jp 163 B☆

    PubMed Central

    Yang, Kuan; Boswell, Mikki; Walter, Dylan J.; Downs, Kevin P.; Gaston-Pravia, Kimberly; Garcia, Tzintzuni; Shen, Yingjia; Mitchell, David L.; Walter, Ronald B.

    2014-01-01

    Xiphophorus fish and interspecies hybrids represent long-standing models to study the genetics underlying spontaneous and induced tumorigenesis. The recent release of the Xiphophorus maculatus genome sequence will allow global genetic regulation studies of genes involved in the inherited susceptibility to UVB-induced melanoma within select backcross hybrids. As a first step toward this goal, we report results of an RNA-Seq approach to identify genes and pathways showing modulated transcription within the skin of X. maculatus Jp 163 B upon UVB exposure. X. maculatus Jp 163 B were exposed to various doses of UVB followed by RNA-Seq analysis at each dose to investigate overall gene expression in each sample. A total of 357 genes with a minimum expression change of 4-fold (p-adj < 0.05) were identified as responsive to UVB. The molecular genetic response of Xiphophorus skin to UVB exposure permitted assessment of; (1) the basal expression level of each transcript for each skin sample, (2) the changes in expression levels for each gene in the transcriptome upon exposure to increasing doses of UVB, and (3) clusters of genes that exhibit similar patterns of change in expression upon UVB exposure. These data provide a foundation for understanding the molecular genetic response of fish skin to UVB exposure. PMID:24556253

  11. Octopus-toolkit: a workflow to automate mining of public epigenomic and transcriptomic next-generation sequencing data

    PubMed Central

    Kim, Taemook; Seo, Hogyu David; Hennighausen, Lothar; Lee, Daeyoup

    2018-01-01

    Abstract Octopus-toolkit is a stand-alone application for retrieving and processing large sets of next-generation sequencing (NGS) data with a single step. Octopus-toolkit is an automated set-up-and-analysis pipeline utilizing the Aspera, SRA Toolkit, FastQC, Trimmomatic, HISAT2, STAR, Samtools, and HOMER applications. All the applications are installed on the user's computer when the program starts. Upon the installation, it can automatically retrieve original files of various epigenomic and transcriptomic data sets, including ChIP-seq, ATAC-seq, DNase-seq, MeDIP-seq, MNase-seq and RNA-seq, from the gene expression omnibus data repository. The downloaded files can then be sequentially processed to generate BAM and BigWig files, which are used for advanced analyses and visualization. Currently, it can process NGS data from popular model genomes such as, human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), plant (Arabidopsis thaliana), zebrafish (Danio rerio), fruit fly (Drosophila melanogaster), worm (Caenorhabditis elegans), and budding yeast (Saccharomyces cerevisiae) genomes. With the processed files from Octopus-toolkit, the meta-analysis of various data sets, motif searches for DNA-binding proteins, and the identification of differentially expressed genes and/or protein-binding sites can be easily conducted with few commands by users. Overall, Octopus-toolkit facilitates the systematic and integrative analysis of available epigenomic and transcriptomic NGS big data. PMID:29420797

  12. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs

    PubMed Central

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore

    2017-01-01

    Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659

  13. Gene Expression Profiling of Liver Cancer Stem Cells by RNA-Sequencing

    PubMed Central

    Lam, Chi Tat; Ng, Michael N. P.; Yu, Wan Ching; Lau, Joyce; Wan, Timothy; Wang, Xiaoqi; Yan, Zhixiang; Liu, Hang; Fan, Sheung Tat

    2012-01-01

    Background Accumulating evidence supports that tumor growth and cancer relapse are driven by cancer stem cells. Our previous work has demonstrated the existence of CD90+ liver cancer stem cells (CSCs) in hepatocellular carcinoma (HCC). Nevertheless, the characteristics of these cells are still poorly understood. In this study, we employed a more sensitive RNA-sequencing (RNA-Seq) to compare the gene expression profiling of CD90+ cells sorted from tumor (CD90+CSCs) with parallel non-tumorous liver tissues (CD90+NTSCs) and elucidate the roles of putative target genes in hepatocarcinogenesis. Methodology/Principal Findings CD90+ cells were sorted respectively from tumor and adjacent non-tumorous human liver tissues using fluorescence-activated cell sorting. The amplified RNAs of CD90+ cells from 3 HCC patients were subjected to RNA-Seq analysis. A differential gene expression profile was established between CD90+CSCs and CD90+NTSCs, and validated by quantitative real-time PCR (qRT-PCR) on the same set of amplified RNAs, and further confirmed in an independent cohort of 12 HCC patients. Five hundred genes were differentially expressed (119 up-regulated and 381 down-regulated genes) between CD90+CSCs and CD90+NTSCs. Gene ontology analysis indicated that the over-expressed genes in CD90+CSCs were associated with inflammation, drug resistance and lipid metabolism. Among the differentially expressed genes, glypican-3 (GPC3), a member of glypican family, was markedly elevated in CD90+CSCs compared to CD90+NTSCs. Immunohistochemistry demonstrated that GPC3 was highly expressed in forty-two human liver tumor tissues but absent in adjacent non-tumorous liver tissues. Flow cytometry indicated that GPC3 was highly expressed in liver CD90+CSCs and mature cancer cells in liver cancer cell lines and human liver tumor tissues. Furthermore, GPC3 expression was positively correlated with the number of CD90+CSCs in liver tumor tissues. Conclusions/Significance The identified genes, such as GPC3 that are distinctly expressed in liver CD90+CSCs, may be promising gene candidates for HCC therapy without inducing damages to normal liver stem cells. PMID:22606345

  14. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders.

    PubMed

    Moreno-Ramos, Oscar A; Olivares, Ana María; Haider, Neena B; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian-South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding.

  15. High-throughput full-length single-cell mRNA-seq of rare cells.

    PubMed

    Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X

    2017-01-01

    Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.

  16. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data.

    PubMed

    Teng, Mingxiang; Irizarry, Rafael A

    2017-11-01

    The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics' public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories. © 2017 Teng and Irizarry; Published by Cold Spring Harbor Laboratory Press.

  17. A non-parametric peak calling algorithm for DamID-Seq.

    PubMed

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  18. A Single-Cell Approach to the Elusive Latent Human Cytomegalovirus Transcriptome.

    PubMed

    Goodrum, Felicia; McWeeney, Shannon

    2018-06-12

    Herpesvirus latency has been difficult to understand molecularly due to low levels of viral genomes and gene expression. In the case of the betaherpesvirus human cytomegalovirus (HCMV), this is further complicated by the heterogeneity inherent to hematopoietic subpopulations harboring genomes and, as a consequence, the various patterns of infection that simultaneously exist in a host, ranging from latent to lytic. Single-cell RNA sequencing (scRNA-seq) provides tremendous potential in measuring the gene expression profiles of heterogeneous cell populations for a wide range of applications, including in studies of cancer, immunology, and infectious disease. A recent study by Shnayder et al. (mBio 9:e00013-18, 2018, https://doi.org/10.1128/mBio.00013-18) utilized scRNA-seq to define transcriptomal characteristics of HCMV latency. They conclude that latency-associated gene expression is similar to the late lytic viral program but at lower levels of expression. The study highlights the numerous challenges, from the definition of latency to the analysis of scRNA-seq, that exist in defining a latent transcriptome. Copyright © 2018 Goodrum and McWeeney.

  19. Boiler: lossy compression of RNA-seq alignments using coverage vectors.

    PubMed

    Pritt, Jacob; Langmead, Ben

    2016-09-19

    We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from github.com/jpritt/boiler. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Switchgrass ubiquitin promoter (PVUBI2) and uses thereof

    DOEpatents

    Stewart, C. Neal; Mann, David George James

    2013-12-10

    The subject application provides polynucleotides, compositions thereof and methods for regulating gene expression in a plant. Polynucleotides disclosed herein comprise novel sequences for a promoter isolated from Panicum virgatum (switchgrass) that initiates transcription of an operably linked nucleotide sequence. Thus, various embodiments of the invention comprise the nucleotide sequence of SEQ ID NO: 2 or fragments thereof comprising nucleotides 1 to 692 of SEQ ID NO: 2 that are capable of driving the expression of an operably linked nucleic acid sequence.

  1. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

    PubMed Central

    2010-01-01

    Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097

  2. Comparative Analyses of H3K4 and H3K27 Trimethylations Between the Mouse Cerebrum and Testis

    PubMed Central

    Cui, Peng; Liu, Wanfei; Zhao, Yuhui; Lin, Qiang; Zhang, Daoyong; Ding, Feng; Xin, Chengqi; Zhang, Zhang; Song, Shuhui; Sun, Fanglin; Yu, Jun; Hu, Songnian

    2012-01-01

    The global features of H3K4 and H3K27 trimethylations (H3K4me3 and H3K27me3) have been well studied in recent years, but most of these studies were performed in mammalian cell lines. In this work, we generated the genome-wide maps of H3K4me3 and H3K27me3 of mouse cerebrum and testis using ChIP-seq and their high-coverage transcriptomes using ribominus RNA-seq with SOLiD technology. We examined the global patterns of H3K4me3 and H3K27me3 in both tissues and found that modifications are closely-associated with tissue-specific expression, function and development. Moreover, we revealed that H3K4me3 and H3K27me3 rarely occur in silent genes, which contradicts the findings in previous studies. Finally, we observed that bivalent domains, with both H3K4me3 and H3K27me3, existed ubiquitously in both tissues and demonstrated an invariable preference for the regulation of developmentally-related genes. However, the bivalent domains tend towards a “winner-takes-all” approach to regulate the expression of associated genes. We also verified the above results in mouse ES cells. As expected, the results in ES cells are consistent with those in cerebrum and testis. In conclusion, we present two very important findings. One is that H3K4me3 and H3K27me3 rarely occur in silent genes. The other is that bivalent domains may adopt a “winner-takes-all” principle to regulate gene expression. PMID:22768982

  3. Transcriptome analysis reveals self-incompatibility in the tea plant (Camellia sinensis) might be under gametophytic control.

    PubMed

    Zhang, Cheng-Cai; Wang, Li-Yuan; Wei, Kang; Wu, Li-Yun; Li, Hai-Lin; Zhang, Fen; Cheng, Hao; Ni, De-Jiang

    2016-05-17

    Self-incompatibility (SI) is under genetic control and prevents inbreeding depression in angiosperms. SI mechanisms are quite complicated and still poorly understood in many plants. Tea (Camellia sinensis L.) belonging to the family of Theaceae, exhibits high levels of SI and high heterozygosity. Uncovering the molecular basis of SI of the tea plant may enhance breeding and simplify genomics research for the whole family. The growth of pollen tubes following selfing and crossing was observed using fluorescence microscopy. Self-pollen tubes grew slower than cross treatments from 24 h to 72 h after pollination. RNA-seq was employed to explore the molecular mechanisms of SI and to identify SI-related genes in C. sinensis. Self and cross-pollinated styles were collected at 24 h, 48 h and 72 h after pollination. Six RNA-seq libraries (SP24, SP48, SP72, CP24 CP48 and CP72; SP = self-pollinated, CP = cross-pollinated) were constructed and separately sequenced. In total, 299.327 million raw reads were generated. Following assembly, 63,762 unigenes were identified, and 27,264 (42.76 %) unigenes were annotated in five public databases: NR, KOG, KEGG, Swiss-Port and GO. To identify SI-related genes, the fragments per kb per million mapped reads (FPKM) values of each unigene were evaluated. Comparisons of CP24 vs. SP24, CP48 vs. SP48 and CP72 vs. SP72 revealed differential expression of 3,182, 3,575 and 3,709 genes, respectively. Consequently, several ubiquitin-mediated proteolysis, Ca(2+) signaling, apoptosis and defense-associated genes were obtained. The temporal expression pattern of genes following CP and SP was analyzed; 6 peroxidase, 1 polyphenol oxidase and 7 salicylic acid biosynthetic process-related genes were identified. The RNA-seq data were validated by qRT-PCR of 15 unigenes. Finally, a unigene (CL25983Contig1) with strong homology to the S-RNase was analyzed. It was mainly expressed in styles, with dramatically higher expression in self-pollinated versus cross-pollinated tissues at 24 h post-pollination. The present study reports the transcriptome of styles after cross- and self-pollination in tea and offers novel insights into the molecular mechanism behind SI in C. sinensis. We believe that this RNA-seq dataset will be useful for improvement in C. sinensis as well as other plants in the Theaceae family.

  4. Decreased expression of SFRP2 promotes development of the pituitary corticotroph adenoma by upregulating Wnt signaling

    PubMed Central

    Sun, Yuhao; Pan, Sijian; Gu, Changwei; Chen, Xiao; Wang, Weiqing; Ning, Guang; Bian, Liuguan; Sun, Qingfang

    2018-01-01

    Cushing's disease is primarily caused by pituitary adrenocorticotropin-secreting adenoma. However, its pathogenesis has remained obscure. In the present study, whole transcriptome analysis was performed by RNA sequencing (RNA-Seq) and expression of secreted frizzled-related protein 2 (SFRP2) was decreased in corticotroph tumors compared with normal pituitary glands. Furthermore, the RNA-Seq results were validated and the expression of SFRP2 in tumor tissues was analyzed by comparing another cohort of 23 patients with Cushing's disease and 3 normal human pituitary samples using reverse transcription-quantitative polymerase chain reaction, western blot and immunohistochemistry staining. Clinically, there was an association between lower SFRP2 expression and aggressive adenoma characteristics, including larger size and invasiveness. Conversely, SFRP2 overexpression reduced the ability of AtT20 cells to proliferate and migrate, and reduced production of the adrenocorticotrophic hormone in vitro. Mechanistically, overexpressed SFRP2 reduced the level of β-catenin in the cytoplasm and nucleus, and decreased Wnt signaling activity in AtT20 cells. Therefore, SFRP2 appears to act as a tumor suppressor in Cushing's disease by regulating the activity of the Wnt signaling pathway. PMID:29620167

  5. DEIsoM: a hierarchical Bayesian model for identifying differentially expressed isoforms using biological replicates

    PubMed Central

    Peng, Hao; Yang, Yifan; Zhe, Shandian; Wang, Jian; Gribskov, Michael; Qi, Yuan

    2017-01-01

    Abstract Motivation High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy. Results We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature. Availability and implementation The software is available at https://github.com/hao-peng/DEIsoM Contact pengh@alumni.purdue.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28595376

  6. Position-specific binding of FUS to nascent RNA regulates mRNA length

    PubMed Central

    Masuda, Akio; Takeda, Jun-ichi; Okuno, Tatsuya; Okamoto, Takaaki; Ohkawara, Bisei; Ito, Mikako; Ishigaki, Shinsuke; Sobue, Gen

    2015-01-01

    More than half of all human genes produce prematurely terminated polyadenylated short mRNAs. However, the underlying mechanisms remain largely elusive. CLIP-seq (cross-linking immunoprecipitation [CLIP] combined with deep sequencing) of FUS (fused in sarcoma) in neuronal cells showed that FUS is frequently clustered around an alternative polyadenylation (APA) site of nascent RNA. ChIP-seq (chromatin immunoprecipitation [ChIP] combined with deep sequencing) of RNA polymerase II (RNAP II) demonstrated that FUS stalls RNAP II and prematurely terminates transcription. When an APA site is located upstream of an FUS cluster, FUS enhances polyadenylation by recruiting CPSF160 and up-regulates the alternative short transcript. In contrast, when an APA site is located downstream from an FUS cluster, polyadenylation is not activated, and the RNAP II-suppressing effect of FUS leads to down-regulation of the alternative short transcript. CAGE-seq (cap analysis of gene expression [CAGE] combined with deep sequencing) and PolyA-seq (a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts) revealed that position-specific regulation of mRNA lengths by FUS is operational in two-thirds of transcripts in neuronal cells, with enrichment in genes involved in synaptic activities. PMID:25995189

  7. Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data.

    PubMed

    Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J

    2018-05-29

    Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.

  8. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data

    PubMed Central

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie

    2018-01-01

    Abstract Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. PMID:29106630

  9. Functional regression method for whole genome eQTL epistasis analysis with sequencing data.

    PubMed

    Xu, Kelin; Jin, Li; Xiong, Momiao

    2017-05-18

    Epistasis plays an essential rule in understanding the regulation mechanisms and is an essential component of the genetic architecture of the gene expressions. However, interaction analysis of gene expressions remains fundamentally unexplored due to great computational challenges and data availability. Due to variation in splicing, transcription start sites, polyadenylation sites, post-transcriptional RNA editing across the entire gene, and transcription rates of the cells, RNA-seq measurements generate large expression variability and collectively create the observed position level read count curves. A single number for measuring gene expression which is widely used for microarray measured gene expression analysis is highly unlikely to sufficiently account for large expression variation across the gene. Simultaneously analyzing epistatic architecture using the RNA-seq and whole genome sequencing (WGS) data poses enormous challenges. We develop a nonlinear functional regression model (FRGM) with functional responses where the position-level read counts within a gene are taken as a function of genomic position, and functional predictors where genotype profiles are viewed as a function of genomic position, for epistasis analysis with RNA-seq data. Instead of testing the interaction of all possible pair-wises SNPs, the FRGM takes a gene as a basic unit for epistasis analysis, which tests for the interaction of all possible pairs of genes and use all the information that can be accessed to collectively test interaction between all possible pairs of SNPs within two genome regions. By large-scale simulations, we demonstrate that the proposed FRGM for epistasis analysis can achieve the correct type 1 error and has higher power to detect the interactions between genes than the existing methods. The proposed methods are applied to the RNA-seq and WGS data from the 1000 Genome Project. The numbers of pairs of significantly interacting genes after Bonferroni correction identified using FRGM, RPKM and DESeq were 16,2361, 260 and 51, respectively, from the 350 European samples. The proposed FRGM for epistasis analysis of RNA-seq can capture isoform and position-level information and will have a broad application. Both simulations and real data analysis highlight the potential for the FRGM to be a good choice of the epistatic analysis with sequencing data.

  10. RNA-Seq profiling reveals novel hepatic gene expression pattern in aflatoxin B1 treated rats.

    PubMed

    Merrick, B Alex; Phadke, Dhiral P; Auerbach, Scott S; Mav, Deepak; Stiegelmeyer, Suzy M; Shah, Ruchir R; Tice, Raymond R

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1's carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT's) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq's capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma.

  11. Comparative RNA-Seq based dissection of the regulatory networks and environmental stimuli underlying Vibrio parahaemolyticus gene expression during infection

    PubMed Central

    Livny, Jonathan; Zhou, Xiaohui; Mandlik, Anjali; Hubbard, Troy; Davis, Brigid M.; Waldor, Matthew K.

    2014-01-01

    Vibrio parahaemolyticus is the leading worldwide cause of seafood-associated gastroenteritis, yet little is known regarding its intraintestinal gene expression or physiology. To date, in vivo analyses have focused on identification and characterization of virulence factors—e.g. a crucial Type III secretion system (T3SS2)—rather than genome-wide analyses of in vivo biology. Here, we used RNA-Seq to profile V. parahaemolyticus gene expression in infected infant rabbits, which mimic human infection. Comparative transcriptomic analysis of V. parahaemolyticus isolated from rabbit intestines and from several laboratory conditions enabled identification of mRNAs and sRNAs induced during infection and of regulatory factors that likely control them. More than 12% of annotated V. parahaemolyticus genes are differentially expressed in the intestine, including the genes of T3SS2, which are likely induced by bile-mediated activation of the transcription factor VtrB. Our analyses also suggest that V. parahaemolyticus has access to glucose or other preferred carbon sources in vivo, but that iron is inconsistently available. The V. parahaemolyticus transcriptional response to in vivo growth is far more widespread than and largely distinct from that of V. cholerae, likely due to the distinct ways in which these diarrheal pathogens interact with and modulate the environment in the small intestine. PMID:25262354

  12. Comparative Transcriptome Analysis of Chinary, Assamica and Cambod tea (Camellia sinensis) Types during Development and Seasonal Variation using RNA-seq Technology

    NASA Astrophysics Data System (ADS)

    Kumar, Ajay; Chawla, Vandna; Sharma, Eshita; Mahajan, Pallavi; Shankar, Ravi; Yadav, Sudesh Kumar

    2016-11-01

    Tea quality and yield is influenced by various factors including developmental tissue, seasonal variation and cultivar type. Here, the molecular basis of these factors was investigated in three tea cultivars namely, Him Sphurti (H), TV23 (T), and UPASI-9 (U) using RNA-seq. Seasonal variation in these cultivars was studied during active (A), mid-dormant (MD), dormant (D) and mid-active (MA) stages in two developmental tissues viz. young and old leaf. Development appears to affect gene expression more than the seasonal variation and cultivar types. Further, detailed transcript and metabolite profiling has identified genes such as F3‧H, F3‧5‧H, FLS, DFR, LAR, ANR and ANS of catechin biosynthesis, while MXMT, SAMS, TCS and XDH of caffeine biosynthesis/catabolism as key regulators during development and seasonal variation among three different tea cultivars. In addition, expression analysis of genes related to phytohormones such as ABA, GA, ethylene and auxin has suggested their role in developmental tissues during seasonal variation in tea cultivars. Moreover, differential expression of genes involved in histone and DNA modification further suggests role of epigenetic mechanism in coordinating global gene expression during developmental and seasonal variation in tea. Our findings provide insights into global transcriptional reprogramming associated with development and seasonal variation in tea.

  13. Comparative Transcriptome Analysis of Chinary, Assamica and Cambod tea (Camellia sinensis) Types during Development and Seasonal Variation using RNA-seq Technology.

    PubMed

    Kumar, Ajay; Chawla, Vandna; Sharma, Eshita; Mahajan, Pallavi; Shankar, Ravi; Yadav, Sudesh Kumar

    2016-11-17

    Tea quality and yield is influenced by various factors including developmental tissue, seasonal variation and cultivar type. Here, the molecular basis of these factors was investigated in three tea cultivars namely, Him Sphurti (H), TV23 (T), and UPASI-9 (U) using RNA-seq. Seasonal variation in these cultivars was studied during active (A), mid-dormant (MD), dormant (D) and mid-active (MA) stages in two developmental tissues viz. young and old leaf. Development appears to affect gene expression more than the seasonal variation and cultivar types. Further, detailed transcript and metabolite profiling has identified genes such as F3'H, F3'5'H, FLS, DFR, LAR, ANR and ANS of catechin biosynthesis, while MXMT, SAMS, TCS and XDH of caffeine biosynthesis/catabolism as key regulators during development and seasonal variation among three different tea cultivars. In addition, expression analysis of genes related to phytohormones such as ABA, GA, ethylene and auxin has suggested their role in developmental tissues during seasonal variation in tea cultivars. Moreover, differential expression of genes involved in histone and DNA modification further suggests role of epigenetic mechanism in coordinating global gene expression during developmental and seasonal variation in tea. Our findings provide insights into global transcriptional reprogramming associated with development and seasonal variation in tea.

  14. Global Analysis of Transcriptome Responses and Gene Expression Profiles to Cold Stress of Jatropha curcas L.

    PubMed Central

    Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming

    2013-01-01

    Background Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. Results In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. Conclusions This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas. PMID:24349370

  15. Global analysis of transcriptome responses and gene expression profiles to cold stress of Jatropha curcas L.

    PubMed

    Wang, Haibo; Zou, Zhurong; Wang, Shasha; Gong, Ming

    2013-01-01

    Jatropha curcas L., also called the Physic nut, is an oil-rich shrub with multiple uses, including biodiesel production, and is currently exploited as a renewable energy resource in many countries. Nevertheless, because of its origin from the tropical MidAmerican zone, J. curcas confers an inherent but undesirable characteristic (low cold resistance) that may seriously restrict its large-scale popularization. This adaptive flaw can be genetically improved by elucidating the mechanisms underlying plant tolerance to cold temperatures. The newly developed Illumina Hiseq™ 2000 RNA-seq and Digital Gene Expression (DGE) are deep high-throughput approaches for gene expression analysis at the transcriptome level, using which we carefully investigated the gene expression profiles in response to cold stress to gain insight into the molecular mechanisms of cold response in J. curcas. In total, 45,251 unigenes were obtained by assembly of clean data generated by RNA-seq analysis of the J. curcas transcriptome. A total of 33,363 and 912 complete or partial coding sequences (CDSs) were determined by protein database alignments and ESTScan prediction, respectively. Among these unigenes, more than 41.52% were involved in approximately 128 known metabolic or signaling pathways, and 4,185 were possibly associated with cold resistance. DGE analysis was used to assess the changes in gene expression when exposed to cold condition (12°C) for 12, 24, and 48 h. The results showed that 3,178 genes were significantly upregulated and 1,244 were downregulated under cold stress. These genes were then functionally annotated based on the transcriptome data from RNA-seq analysis. This study provides a global view of transcriptome response and gene expression profiling of J. curcas in response to cold stress. The results can help improve our current understanding of the mechanisms underlying plant cold resistance and favor the screening of crucial genes for genetically enhancing cold resistance in J. curcas.

  16. Assessing senescence in Drosophila using video tracking.

    PubMed

    Ardekani, Reza; Tavaré, Simon; Tower, John

    2013-01-01

    Senescence is associated with changes in gene expression, including the upregulation of stress response- and innate immune response-related genes. In addition, aging animals exhibit characteristic changes in movement behaviors including decreased gait speed and a deterioration in sleep/wake rhythms. Here, we describe methods for tracking Drosophila melanogaster movements in 3D with simultaneous quantification of fluorescent transgenic reporters. This approach allows for the assessment of correlations between behavior, aging, and gene expression as well as for the quantification of biomarkers of aging.

  17. Next-generation sequencing facilitates quantitative analysis of wild-type and Nrl−/− retinal transcriptomes

    PubMed Central

    Brooks, Matthew J.; Rajasimha, Harsha K.; Roger, Jerome E.

    2011-01-01

    Purpose Next-generation sequencing (NGS) has revolutionized systems-based analysis of cellular pathways. The goals of this study are to compare NGS-derived retinal transcriptome profiling (RNA-seq) to microarray and quantitative reverse transcription polymerase chain reaction (qRT–PCR) methods and to evaluate protocols for optimal high-throughput data analysis. Methods Retinal mRNA profiles of 21-day-old wild-type (WT) and neural retina leucine zipper knockout (Nrl−/−) mice were generated by deep sequencing, in triplicate, using Illumina GAIIx. The sequence reads that passed quality filters were analyzed at the transcript isoform level with two methods: Burrows–Wheeler Aligner (BWA) followed by ANOVA (ANOVA) and TopHat followed by Cufflinks. qRT–PCR validation was performed using TaqMan and SYBR Green assays. Results Using an optimized data analysis workflow, we mapped about 30 million sequence reads per sample to the mouse genome (build mm9) and identified 16,014 transcripts in the retinas of WT and Nrl−/− mice with BWA workflow and 34,115 transcripts with TopHat workflow. RNA-seq data confirmed stable expression of 25 known housekeeping genes, and 12 of these were validated with qRT–PCR. RNA-seq data had a linear relationship with qRT–PCR for more than four orders of magnitude and a goodness of fit (R2) of 0.8798. Approximately 10% of the transcripts showed differential expression between the WT and Nrl−/− retina, with a fold change ≥1.5 and p value <0.05. Altered expression of 25 genes was confirmed with qRT–PCR, demonstrating the high degree of sensitivity of the RNA-seq method. Hierarchical clustering of differentially expressed genes uncovered several as yet uncharacterized genes that may contribute to retinal function. Data analysis with BWA and TopHat workflows revealed a significant overlap yet provided complementary insights in transcriptome profiling. Conclusions Our study represents the first detailed analysis of retinal transcriptomes, with biologic replicates, generated by RNA-seq technology. The optimized data analysis workflows reported here should provide a framework for comparative investigations of expression profiles. Our results show that NGS offers a comprehensive and more accurate quantitative and qualitative evaluation of mRNA content within a cell or tissue. We conclude that RNA-seq based transcriptome characterization would expedite genetic network analyses and permit the dissection of complex biologic functions. PMID:22162623

  18. Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow

    PubMed Central

    Buschmann, Dominik; Haberberger, Anna; Kirchner, Benedikt; Spornraft, Melanie; Riedmaier, Irmgard; Schelling, Gustav; Pfaffl, Michael W.

    2016-01-01

    Small RNA-Seq has emerged as a powerful tool in transcriptomics, gene expression profiling and biomarker discovery. Sequencing cell-free nucleic acids, particularly microRNA (miRNA), from liquid biopsies additionally provides exciting possibilities for molecular diagnostics, and might help establish disease-specific biomarker signatures. The complexity of the small RNA-Seq workflow, however, bears challenges and biases that researchers need to be aware of in order to generate high-quality data. Rigorous standardization and extensive validation are required to guarantee reliability, reproducibility and comparability of research findings. Hypotheses based on flawed experimental conditions can be inconsistent and even misleading. Comparable to the well-established MIQE guidelines for qPCR experiments, this work aims at establishing guidelines for experimental design and pre-analytical sample processing, standardization of library preparation and sequencing reactions, as well as facilitating data analysis. We highlight bottlenecks in small RNA-Seq experiments, point out the importance of stringent quality control and validation, and provide a primer for differential expression analysis and biomarker discovery. Following our recommendations will encourage better sequencing practice, increase experimental transparency and lead to more reproducible small RNA-Seq results. This will ultimately enhance the validity of biomarker signatures, and allow reliable and robust clinical predictions. PMID:27317696

  19. SERE: single-parameter quality control and sample comparison for RNA-Seq.

    PubMed

    Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S

    2012-10-03

    Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.

  20. SERE: Single-parameter quality control and sample comparison for RNA-Seq

    PubMed Central

    2012-01-01

    Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915

  1. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    PubMed

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei

    2017-05-19

    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. A normalization strategy for comparing tag count data

    PubMed Central

    2012-01-01

    Background High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. Results We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. Conclusion Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. PMID:22475125

  3. Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq

    PubMed Central

    Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim

    2014-01-01

    The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.

  4. Histone Deacetylase Inhibition Promotes Osteoblast Maturation by Altering the Histone H4 Epigenome and Reduces Akt Phosphorylation*

    PubMed Central

    Dudakovic, Amel; Evans, Jared M.; Li, Ying; Middha, Sumit; McGee-Lawrence, Meghan E.; van Wijnen, Andre J.; Westendorf, Jennifer J.

    2013-01-01

    Bone has remarkable regenerative capacity, but this ability diminishes during aging. Histone deacetylase inhibitors (HDIs) promote terminal osteoblast differentiation and extracellular matrix production in culture. The epigenetic events altered by HDIs in osteoblasts may hold clues for the development of new anabolic treatments for osteoporosis and other conditions of low bone mass. To assess how HDIs affect the epigenome of committed osteoblasts, MC3T3 cells were treated with suberoylanilide hydroxamic acid (SAHA) and subjected to microarray gene expression profiling and high-throughput ChIP-Seq analysis. As expected, SAHA induced differentiation and matrix calcification of osteoblasts in vitro. ChIP-Seq analysis revealed that SAHA increased histone H4 acetylation genome-wide and in differentially regulated genes, except for the 500 bp upstream of transcriptional start sites. Pathway analysis indicated that SAHA increased the expression of insulin signaling modulators, including Slc9a3r1. SAHA decreased phosphorylation of insulin receptor β, Akt, and the Akt substrate FoxO1, resulting in FoxO1 stabilization. Thus, SAHA induces genome-wide H4 acetylation and modulates the insulin/Akt/FoxO1 signaling axis, whereas it promotes terminal osteoblast differentiation in vitro. PMID:23940046

  5. Gene expression analysis of TIL rich HPV-driven head and neck tumors reveals a distinct B-cell signature when compared to HPV independent tumors.

    PubMed

    Wood, Oliver; Woo, Jeongmin; Seumois, Gregory; Savelyeva, Natalia; McCann, Katy J; Singh, Divya; Jones, Terry; Peel, Lailah; Breen, Michael S; Ward, Matthew; Garrido Martin, Eva; Sanchez-Elsner, Tilman; Thomas, Gareth; Vijayanand, Pandurangan; Woelk, Christopher H; King, Emma; Ottensmeier, Christian

    2016-08-30

    Human papilloma virus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) has a better prognosis than it's HPV negative (HPV(-)) counterpart. This may be due to the higher numbers of tumor-infiltrating lymphocytes (TILs) in HPV positive (HPV(+)) tumors. RNA-Sequencing (RNA-Seq) was used to evaluate whether the differences in clinical behaviour simply reflect a numerical difference in TILs or whether there is a fundamental behavioural difference between TILs in these two settings. Thirty-nine HNSCC tumors were scored for TIL density by immunohistochemistry. After the removal of 16 TILlow tumors, RNA-Seq analysis was performed on 23 TILhigh/med tumors (HPV(+) n=10 and HPV(-) n=13). Using EdgeR, differentially expressed genes (DEG) were identified. Immune subset analysis was performed using Functional Analysis of Individual RNA-Seq/ Microarray Expression (FAIME) and immune gene RNA transcript count analysis. In total, 1,634 DEGs were identified, with a dominant immune signature observed in HPV(+) tumors. After normalizing the expression profiles to account for differences in B- and T-cell number, 437 significantly DEGs remained. A B-cell associated signature distinguished HPV(+) from HPV(-) tumors, and included the DEGs CD200, GGA2, ADAM28, STAG3, SPIB, VCAM1, BCL2 and ICOSLG; the immune signal relative to T-cells was qualitatively similar between TILs of both tumor cohorts. Our findings were validated and confirmed in two independent cohorts using TCGA data and tumor-infiltrating B-cells from additional HPV(+) HNSCC patients. A B-cell associated signal segregated tumors relative to HPV status. Our data suggests that the role of B-cells in the adaptive immune response to HPV(+) HNSCC requires re-assessment.

  6. An RNA-Seq Transcriptome Analysis of Orthophosphate-Deficient White Lupin Reveals Novel Insights into Phosphorus Acclimation in Plants1[W][OA

    PubMed Central

    O’Rourke, Jamie A.; Yang, S. Samuel; Miller, Susan S.; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W.; Vance, Carroll P.

    2013-01-01

    Phosphorus, in its orthophosphate form (Pi), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to Pi deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in Pi-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to Pi supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to Pi deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to Pi deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the Pi status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in Pi deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to Pi deficiency. PMID:23197803

  7. An RNA-Seq transcriptome analysis of orthophosphate-deficient white lupin reveals novel insights into phosphorus acclimation in plants.

    PubMed

    O'Rourke, Jamie A; Yang, S Samuel; Miller, Susan S; Bucciarelli, Bruna; Liu, Junqi; Rydeen, Ariel; Bozsoki, Zoltan; Uhde-Stone, Claudia; Tu, Zheng Jin; Allan, Deborah; Gronwald, John W; Vance, Carroll P

    2013-02-01

    Phosphorus, in its orthophosphate form (P(i)), is one of the most limiting macronutrients in soils for plant growth and development. However, the whole-genome molecular mechanisms contributing to plant acclimation to P(i) deficiency remain largely unknown. White lupin (Lupinus albus) has evolved unique adaptations for growth in P(i)-deficient soils, including the development of cluster roots to increase root surface area. In this study, we utilized RNA-Seq technology to assess global gene expression in white lupin cluster roots, normal roots, and leaves in response to P(i) supply. We de novo assembled 277,224,180 Illumina reads from 12 complementary DNA libraries to build what is to our knowledge the first white lupin gene index (LAGI 1.0). This index contains 125,821 unique sequences with an average length of 1,155 bp. Of these sequences, 50,734 were transcriptionally active (reads per kilobase per million reads ≥ 3), representing approximately 7.8% of the white lupin genome, using the predicted genome size of Lupinus angustifolius as a reference. We identified a total of 2,128 sequences differentially expressed in response to P(i) deficiency with a 2-fold or greater change and P ≤ 0.05. Twelve sequences were consistently differentially expressed due to P(i) deficiency stress in three species, Arabidopsis (Arabidopsis thaliana), potato (Solanum tuberosum), and white lupin, making them ideal candidates to monitor the P(i) status of plants. Additionally, classic physiological experiments were coupled with RNA-Seq data to examine the role of cytokinin and gibberellic acid in P(i) deficiency-induced cluster root development. This global gene expression analysis provides new insights into the biochemical and molecular mechanisms involved in the acclimation to P(i) deficiency.

  8. Gene expression analysis of TIL rich HPV-driven head and neck tumors reveals a distinct B-cell signature when compared to HPV independent tumors

    PubMed Central

    Savelyeva, Natalia; McCann, Katy J.; Singh, Divya; Jones, Terry; Peel, Lailah; Breen, Michael S.; Ward, Matthew; Martin, Eva Garrido

    2016-01-01

    Human papilloma virus (HPV)-associated head and neck squamous cell carcinoma (HNSCC) has a better prognosis than it's HPV negative (HPV(−)) counterpart. This may be due to the higher numbers of tumor-infiltrating lymphocytes (TILs) in HPV positive (HPV(+)) tumors. RNA-Sequencing (RNA-Seq) was used to evaluate whether the differences in clinical behaviour simply reflect a numerical difference in TILs or whether there is a fundamental behavioural difference between TILs in these two settings. Thirty-nine HNSCC tumors were scored for TIL density by immunohistochemistry. After the removal of 16 TILlow tumors, RNA-Seq analysis was performed on 23 TILhigh/med tumors (HPV(+) n=10 and HPV(−) n=13). Using EdgeR, differentially expressed genes (DEG) were identified. Immune subset analysis was performed using Functional Analysis of Individual RNA-Seq/ Microarray Expression (FAIME) and immune gene RNA transcript count analysis. In total, 1,634 DEGs were identified, with a dominant immune signature observed in HPV(+) tumors. After normalizing the expression profiles to account for differences in B- and T-cell number, 437 significantly DEGs remained. A B-cell associated signature distinguished HPV(+) from HPV(−) tumors, and included the DEGs CD200, GGA2, ADAM28, STAG3, SPIB, VCAM1, BCL2 and ICOSLG; the immune signal relative to T-cells was qualitatively similar between TILs of both tumor cohorts. Our findings were validated and confirmed in two independent cohorts using TCGA data and tumor-infiltrating B-cells from additional HPV(+) HNSCC patients. A B-cell associated signal segregated tumors relative to HPV status. Our data suggests that the role of B-cells in the adaptive immune response to HPV(+) HNSCC requires re-assessment. PMID:27462861

  9. Discovering Single Nucleotide Polymorphisms Regulating Human Gene Expression Using Allele Specific Expression from RNA-seq Data

    PubMed Central

    Kang, Eun Yong; Martin, Lisa J.; Mangul, Serghei; Isvilanonda, Warin; Zou, Jennifer; Ben-David, Eyal; Han, Buhm; Lusis, Aldons J.; Shifman, Sagiv; Eskin, Eleazar

    2016-01-01

    The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here, we increased the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We designed a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-sequencing (RNA-seq) data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. A total of 2309 SNPs were identified as being associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for a regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases. PMID:27765809

  10. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology.

    PubMed

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Pierzchała, Mariusz; Feng, Yaping; Kadarmideen, Haja N; Kumar, Dibyendu

    2017-01-01

    RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF) and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits. The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel) positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs) with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM) SNP genotyping assay. The comprehensive QTL/CG analysis of 110 QTL/CG with RNA-seq data identified 20 monomorphic SNP hit loci (CARTPT, GAD1, GDF5, GHRH, GHRL, GRB10, IGFBPL1, IGFL1, LEP, LHX4, MC4R, MSTN, NKAIN1, PLAG1, POU1F1, SDR16C5, SH2B2, TOX, UCP3 and WNT10B) in all three cattle breeds. However, six SNP loci (CCSER1, GHR, KCNIP4, MTSS1, EGFR and NSMCE2) were identified as highly polymorphic among the cattle breeds. This study identified breed-specific SNPs with greater SNP ratio and excellent mapping coverage, as well as monomorphic and highly polymorphic putative SNP loci within QTL/CGs of bovine liver tissue. A breed-specific SNP-db constructed for bovine liver yielded nearly six million SNPs. In addition, a KASPTM SNP genotyping assay, as a reliable cost-effective method, successfully validated the breed-specific putative SNPs originating from the RNA-seq experiments.

  11. Long Noncoding RNAs AC009014.3 and Newly Discovered XPLAID Differentiate Aggressive and Indolent Prostate Cancers.

    PubMed

    Cesnik, Anthony J; Yang, Bing; Truong, Andrew; Etheridge, Tyler; Spiniello, Michele; Steinbrink, Maisie I; Shortreed, Michael R; Frey, Brian L; Jarrard, David F; Smith, Lloyd M

    2018-06-01

    The molecular mechanisms underlying aggressive versus indolent disease are not fully understood. Recent research has implicated a class of molecules known as long noncoding RNAs (lncRNAs) in tumorigenesis and progression of cancer. Our objective was to discover lncRNAs that differentiate aggressive and indolent prostate cancers. We analyzed paired tumor and normal tissues from six aggressive Gleason score (GS) 8-10 and six indolent GS 6 prostate cancers. Extracted RNA was split for poly(A)+ and ribosomal RNA depletion library preparations, followed byRNA sequencing (RNA-Seq) using an Illumina HiSeq 2000. We developed an RNA-Seq data analysis pipeline to discover and quantify these molecules. Candidate lncRNAs were validated using RT-qPCR on 87 tumor tissue samples: 28 (GS 6), 28 (GS 3+4), 6 (GS 4+3), and 25 (GS 8-10). Statistical correlations between lncRNAs and clinicopathologic variables were tested using ANOVA. The 43 differentially expressed (DE) lncRNAs between aggressive and indolent prostate cancers included 12 annotated and 31 novel lncRNAs. The top six DE lncRNAs were selected based on large, consistent fold-changes in the RNA-Seq results. Three of these candidates passed RT-qPCR validation, including AC009014.3 (P < .001 in tumor tissue) and a newly discovered X-linked lncRNA named XPLAID (P = .049 in tumor tissue and P = .048 in normal tissue). XPLAID and AC009014.3 show promise as prognostic biomarkers. We discovered several dozen lncRNAs that distinguish aggressive and indolent prostate cancers, of which four were validated using RT-qPCR. The investigation into their biology is ongoing. Published by Elsevier Inc.

  12. Pregnancy-induced gene expression changes in vivo among women with rheumatoid arthritis: a pilot study.

    PubMed

    Goin, Dana E; Smed, Mette Kiel; Pachter, Lior; Purdom, Elizabeth; Nelson, J Lee; Kjærgaard, Hanne; Olsen, Jørn; Hetland, Merete Lund; Zoffmann, Vibeke; Ottesen, Bent; Jawaheer, Damini

    2017-05-25

    Little is known about gene expression changes induced by pregnancy in women with rheumatoid arthritis (RA) and healthy women because the few studies previously conducted did not have pre-pregnancy samples available as baseline. We have established a cohort of women with RA and healthy women followed prospectively from a pre-pregnancy baseline. In this study, we tested the hypothesis that pregnancy-induced changes in gene expression among women with RA who improve during pregnancy (pregDAS improved ) overlap substantially with changes observed among healthy women and differ from changes observed among women with RA who worsen during pregnancy (pregDAS worse ). Global gene expression profiles were generated by RNA sequencing (RNA-seq) from 11 women with RA and 5 healthy women before pregnancy (T0) and at the third trimester (T3). Among the women with RA, eight showed an improvement in disease activity by T3, whereas three worsened. Differential expression analysis was used to identify genes demonstrating significant changes in expression within each of the RA and healthy groups (T3 vs T0), as well as between the groups at each time point. Gene set enrichment was assessed in terms of Gene Ontology processes and protein networks. A total of 1296 genes were differentially expressed between T3 and T0 among the 8 pregDAS improved women, with 161 genes showing at least two-fold change (FC) in expression by T3. The majority (108 of 161 genes) were also differentially expressed among healthy women (q<0.05, FC≥2). Additionally, a small cluster of genes demonstrated contrasting changes in expression between the pregDAS improved and pregDAS worse groups, all of which were inducible by type I interferon (IFN). These IFN-inducible genes were over-expressed at T3 compared to the T0 baseline among the pregDAS improved women. In our pilot RNA-seq dataset, increased pregnancy-induced expression of type I IFN-inducible genes was observed among women with RA who improved during pregnancy, but not among women who worsened. These findings warrant further investigation into expression of these genes in RA pregnancy and their potential role in modulation of disease activity. These results are nevertheless preliminary and should be interpreted with caution until replicated in a larger sample.

  13. Identification and validation of differentially expressed transcripts by RNA-sequencing of formalin-fixed, paraffin-embedded (FFPE) lung tissue from patients with Idiopathic Pulmonary Fibrosis.

    PubMed

    Vukmirovic, Milica; Herazo-Maya, Jose D; Blackmon, John; Skodric-Trifunovic, Vesna; Jovanovic, Dragana; Pavlovic, Sonja; Stojsic, Jelena; Zeljkovic, Vesna; Yan, Xiting; Homer, Robert; Stefanovic, Branko; Kaminski, Naftali

    2017-01-12

    Idiopathic Pulmonary Fibrosis (IPF) is a lethal lung disease of unknown etiology. A major limitation in transcriptomic profiling of lung tissue in IPF has been a dependence on snap-frozen fresh tissues (FF). In this project we sought to determine whether genome scale transcript profiling using RNA Sequencing (RNA-Seq) could be applied to archived Formalin-Fixed Paraffin-Embedded (FFPE) IPF tissues. We isolated total RNA from 7 IPF and 5 control FFPE lung tissues and performed 50 base pair paired-end sequencing on Illumina 2000 HiSeq. TopHat2 was used to map sequencing reads to the human genome. On average ~62 million reads (53.4% of ~116 million reads) were mapped per sample. 4,131 genes were differentially expressed between IPF and controls (1,920 increased and 2,211 decreased (FDR < 0.05). We compared our results to differentially expressed genes calculated from a previously published dataset generated from FF tissues analyzed on Agilent microarrays (GSE47460). The overlap of differentially expressed genes was very high (760 increased and 1,413 decreased, FDR < 0.05). Only 92 differentially expressed genes changed in opposite directions. Pathway enrichment analysis performed using MetaCore confirmed numerous IPF relevant genes and pathways including extracellular remodeling, TGF-beta, and WNT. Gene network analysis of MMP7, a highly differentially expressed gene in both datasets, revealed the same canonical pathways and gene network candidates in RNA-Seq and microarray data. For validation by NanoString nCounter® we selected 35 genes that had a fold change of 2 in at least one dataset (10 discordant, 10 significantly differentially expressed in one dataset only and 15 concordant genes). High concordance of fold change and FDR was observed for each type of the samples (FF vs FFPE) with both microarrays (r = 0.92) and RNA-Seq (r = 0.90) and the number of discordant genes was reduced to four. Our results demonstrate that RNA sequencing of RNA obtained from archived FFPE lung tissues is feasible. The results obtained from FFPE tissue are highly comparable to FF tissues. The ability to perform RNA-Seq on archived FFPE IPF tissues should greatly enhance the availability of tissue biopsies for research in IPF.

  14. Identification of mycoparasitism-related genes against the phytopathogen Sclerotinia sclerotiorum through transcriptome and expression profile analysis in Trichoderma harzianum.

    PubMed

    Steindorff, Andrei Stecca; Ramada, Marcelo Henrique Soller; Coelho, Alexandre Siqueira Guedes; Miller, Robert Neil Gerard; Pappas, Georgios Joannis; Ulhoa, Cirano José; Noronha, Eliane Ferreira

    2014-03-18

    The species of T. harzianum are well known for their biocontrol activity against plant pathogens. However, few studies have been conducted to further our understanding of its role as a biological control agent against S. sclerotiorum, a pathogen involved in several crop diseases around the world. In this study, we have used RNA-seq and quantitative real-time PCR (RT-qPCR) techniques in order to explore changes in T. harzianum gene expression during growth on cell wall of S. sclerotiorum (SSCW) or glucose. RT-qPCR was also used to examine genes potentially involved in biocontrol, during confrontation between T. harzianum and S. sclerotiorum. Data obtained from six RNA-seq libraries were aligned onto the T. harzianum CBS 226.95 reference genome and compared after annotation using the Blast2GO suite. A total of 297 differentially expressed genes were found in mycelia grown for 12, 24 and 36 h under the two different conditions: supplemented with glucose or SSCW. Functional annotation of these genes identified diverse biological processes and molecular functions required during T. harzianum growth on SSCW or glucose. We identified various genes of biotechnological value encoding proteins with functions such as transporters, hydrolytic activity, adherence, appressorium development and pathogenesis. To validate the expression profile, RT-qPCR was performed using 20 randomly chosen genes. RT-qPCR expression profiles were in complete agreement with the RNA-Seq data for 17 of the genes evaluated. The other three showed differences at one or two growth times. During the confrontation assay, some genes were up-regulated during and after contact, as shown in the presence of SSCW which is commonly used as a model to mimic this interaction. The present study is the first initiative to use RNA-seq for identification of differentially expressed genes in T. harzianum strain TR274, in response to the phytopathogenic fungus S. sclerotiorum. It provides insights into the mechanisms of gene expression involved in mycoparasitism of T. harzianum against S.sclerotiorum. The RNA-seq data presented will facilitate improvement of the annotation of gene models in the draft T. harzianum genome and provide important information regarding the transcriptome during this interaction.

  15. Single-cell gene expression analysis reveals diversity among human spermatogonia.

    PubMed

    Neuhaus, N; Yoon, J; Terwort, N; Kliesch, S; Seggewiss, J; Huge, A; Voss, R; Schlatt, S; Grindberg, R V; Schöler, H R

    2017-02-10

    Is the molecular profile of human spermatogonia homogeneous or heterogeneous when analysed at the single-cell level? Heterogeneous expression profiles may be a key characteristic of human spermatogonia, supporting the existence of a heterogeneous stem cell population. Despite the fact that many studies have sought to identify specific markers for human spermatogonia, the molecular fingerprint of these cells remains hitherto unknown. Testicular tissues from patients with spermatogonial arrest (arrest, n = 1) and with qualitatively normal spermatogenesis (normal, n = 7) were selected from a pool of 179 consecutively obtained biopsies. Gene expression analyses of cell populations and single-cells (n = 105) were performed. Two OCT4-positive individual cells were selected for global transcriptional capture using shallow RNA-seq. Finally, expression of four candidate markers was assessed by immunohistochemistry. Histological analysis and blood hormone measurements for LH, FSH and testosterone were performed prior to testicular sample selection. Following enzymatic digestion of testicular tissues, differential plating and subsequent micromanipulation of individual cells was employed to enrich and isolate human spermatogonia, respectively. Endpoint analyses were qPCR analysis of cell populations and individual cells, shallow RNA-seq and immunohistochemical analyses. Unexpectedly, single-cell expression data from the arrest patient (20 cells) showed heterogeneous expression profiles. Also, from patients with normal spermatogenesis, heterogeneous expression patterns of undifferentiated (OCT4, UTF1 and MAGE A4) and differentiated marker genes (BOLL and PRM2) were obtained within each spermatogonia cluster (13 clusters with 85 cells). Shallow RNA-seq analysis of individual human spermatogonia was validated, and a spermatogonia-specific heterogeneous protein expression of selected candidate markers (DDX5, TSPY1, EEF1A1 and NGN3) was demonstrated. The heterogeneity of human spermatogonia at the RNA and protein levels is a snapshot. To further assess the functional meaning of this heterogeneity and the dynamics of stem cell populations, approaches need to be developed to facilitate the repeated analysis of individual cells. Our data suggest that heterogeneous expression profiles may be a key characteristic of human spermatogonia, supporting the model of a heterogeneous stem cell population. Future studies will assess the dynamics of spermatogonial populations in fertile and infertile patients. RNA-seq data is published in the GEO database: GSE91063. This work was supported by the Max Planck Society and the Deutsche Forschungsgemeinschaft DFG-Research Unit FOR 1041 Germ Cell Potential (grant numbers SCHO 340/7-1, SCHL394/11-2). The authors declare that there is no conflict of interest. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  16. Time-series RNA-seq analysis package (TRAP) and its application to the analysis of rice, Oryza sativa L. ssp. Japonica, upon drought stress.

    PubMed

    Jo, Kyuri; Kwon, Hawk-Bin; Kim, Sun

    2014-06-01

    Measuring expression levels of genes at the whole genome level can be useful for many purposes, especially for revealing biological pathways underlying specific phenotype conditions. When gene expression is measured over a time period, we have opportunities to understand how organisms react to stress conditions over time. Thus many biologists routinely measure whole genome level gene expressions at multiple time points. However, there are several technical difficulties for analyzing such whole genome expression data. In addition, these days gene expression data is often measured by using RNA-sequencing rather than microarray technologies and then analysis of expression data is much more complicated since the analysis process should start with mapping short reads and produce differentially activated pathways and also possibly interactions among pathways. In addition, many useful tools for analyzing microarray gene expression data are not applicable for the RNA-seq data. Thus a comprehensive package for analyzing time series transcriptome data is much needed. In this article, we present a comprehensive package, Time-series RNA-seq Analysis Package (TRAP), integrating all necessary tasks such as mapping short reads, measuring gene expression levels, finding differentially expressed genes (DEGs), clustering and pathway analysis for time-series data in a single environment. In addition to implementing useful algorithms that are not available for RNA-seq data, we extended existing pathway analysis methods, ORA and SPIA, for time series analysis and estimates statistical values for combined dataset by an advanced metric. TRAP also produces visual summary of pathway interactions. Gene expression change labeling, a practical clustering method used in TRAP, enables more accurate interpretation of the data when combined with pathway analysis. We applied our methods on a real dataset for the analysis of rice (Oryza sativa L. Japonica nipponbare) upon drought stress. The result showed that TRAP was able to detect pathways more accurately than several existing methods. TRAP is available at http://biohealth.snu.ac.kr/software/TRAP/. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Whole-Exome Sequencing in a South American Cohort Links ALDH1A3, FOXN1 and Retinoic Acid Regulation Pathways to Autism Spectrum Disorders

    PubMed Central

    Moreno-Ramos, Oscar A.; Olivares, Ana María; Haider, Neena B.; de Autismo, Liga Colombiana; Lattig, María Claudia

    2015-01-01

    Autism spectrum disorders (ASDs) are a range of complex neurodevelopmental conditions principally characterized by dysfunctions linked to mental development. Previous studies have shown that there are more than 1000 genes likely involved in ASD, expressed mainly in brain and highly interconnected among them. We applied whole exome sequencing in Colombian—South American trios. Two missense novel SNVs were found in the same child: ALDH1A3 (RefSeq NM_000693: c.1514T>C (p.I505T)) and FOXN1 (RefSeq NM_003593: c.146C>T (p.S49L)). Gene expression studies reveal that Aldh1a3 and Foxn1 are expressed in ~E13.5 mouse embryonic brain, as well as in adult piriform cortex (PC; ~P30). Conserved Retinoic Acid Response Elements (RAREs) upstream of human ALDH1A3 and FOXN1 and in mouse Aldh1a3 and Foxn1 genes were revealed using bioinformatic approximation. Chromatin immunoprecipitation (ChIP) assay using Retinoid Acid Receptor B (Rarb) as the immunoprecipitation target suggests RA regulation of Aldh1a3 and Foxn1 in mice. Our results frame a possible link of RA regulation in brain to ASD etiology, and a feasible non-additive effect of two apparently unrelated variants in ALDH1A3 and FOXN1 recognizing that every result given by next generation sequencing should be cautiously analyzed, as it might be an incidental finding. PMID:26352270

  18. Mapping Mammalian Cell-type-specific Transcriptional Regulatory Networks Using KD-CAGE and ChIP-seq Data in the TC-YIK Cell Line

    PubMed Central

    Lizio, Marina; Ishizu, Yuri; Itoh, Masayoshi; Lassmann, Timo; Hasegawa, Akira; Kubosaki, Atsutaka; Severin, Jessica; Kawaji, Hideya; Nakamura, Yukio; Suzuki, Harukazu; Hayashizaki, Yoshihide; Carninci, Piero; Forrest, Alistair R. R.

    2015-01-01

    Mammals are composed of hundreds of different cell types with specialized functions. Each of these cellular phenotypes are controlled by different combinations of transcription factors. Using a human non islet cell insulinoma cell line (TC-YIK) which expresses insulin and the majority of known pancreatic beta cell specific genes as an example, we describe a general approach to identify key cell-type-specific transcription factors (TFs) and their direct and indirect targets. By ranking all human TFs by their level of enriched expression in TC-YIK relative to a broad collection of samples (FANTOM5), we confirmed known key regulators of pancreatic function and development. Systematic siRNA mediated perturbation of these TFs followed by qRT-PCR revealed their interconnections with NEUROD1 at the top of the regulation hierarchy and its depletion drastically reducing insulin levels. For 15 of the TF knock-downs (KD), we then used Cap Analysis of Gene Expression (CAGE) to identify thousands of their targets genome-wide (KD-CAGE). The data confirm NEUROD1 as a key positive regulator in the transcriptional regulatory network (TRN), and ISL1, and PROX1 as antagonists. As a complimentary approach we used ChIP-seq on four of these factors to identify NEUROD1, LMX1A, PAX6, and RFX6 binding sites in the human genome. Examining the overlap between genes perturbed in the KD-CAGE experiments and genes with a ChIP-seq peak within 50 kb of their promoter, we identified direct transcriptional targets of these TFs. Integration of KD-CAGE and ChIP-seq data shows that both NEUROD1 and LMX1A work as the main transcriptional activators. In the core TRN (i.e., TF-TF only), NEUROD1 directly transcriptionally activates the pancreatic TFs HSF4, INSM1, MLXIPL, MYT1, NKX6-3, ONECUT2, PAX4, PROX1, RFX6, ST18, DACH1, and SHOX2, while LMX1A directly transcriptionally activates DACH1, SHOX2, PAX6, and PDX1. Analysis of these complementary datasets suggests the need for caution in interpreting ChIP-seq datasets. (1) A large fraction of binding sites are at distal enhancer sites and cannot be directly associated to their targets, without chromatin conformation data. (2) Many peaks may be non-functional: even when there is a peak at a promoter, the expression of the gene may not be affected in the matching perturbation experiment. PMID:26635867

  19. De Novo Transcriptome Analysis Shows That SAV-3 Infection Upregulates Pattern Recognition Receptors of the Endosomal Toll-Like and RIG-I-Like Receptor Signaling Pathways in Macrophage/Dendritic Like TO-Cells.

    PubMed

    Xu, Cheng; Evensen, Øystein; Munang'andu, Hetron

    2016-04-21

    A fundamental step in cellular defense mechanisms is the recognition of "danger signals" made of conserved pathogen associated molecular patterns (PAMPs) expressed by invading pathogens, by host cell germ line coded pattern recognition receptors (PRRs). In this study, we used RNA-seq and the Kyoto encyclopedia of genes and genomes (KEGG) to identify PRRs together with the network pathway of differentially expressed genes (DEGs) that recognize salmonid alphavirus subtype 3 (SAV-3) infection in macrophage/dendritic like TO-cells derived from Atlantic salmon (Salmo salar L) headkidney leukocytes. Our findings show that recognition of SAV-3 in TO-cells was restricted to endosomal Toll-like receptors (TLRs) 3 and 8 together with RIG-I-like receptors (RLRs) and not the nucleotide-binding oligomerization domain-like receptors NOD-like receptor (NLRs) genes. Among the RLRs, upregulated genes included the retinoic acid inducible gene I (RIG-I), melanoma differentiation association 5 (MDA5) and laboratory of genetics and physiology 2 (LGP2). The study points to possible involvement of the tripartite motif containing 25 (TRIM25) and mitochondrial antiviral signaling protein (MAVS) in modulating RIG-I signaling being the first report that links these genes to the RLR pathway in SAV-3 infection in TO-cells. Downstream signaling suggests that both the TLR and RLR pathways use interferon (IFN) regulatory factors (IRFs) 3 and 7 to produce IFN-a2. The validity of RNA-seq data generated in this study was confirmed by quantitative real time qRT-PCR showing that genes up- or downregulated by RNA-seq were also up- or downregulated by RT-PCR. Overall, this study shows that de novo transcriptome assembly identify key receptors of the TLR and RLR sensors engaged in host pathogen interaction at cellular level. We envisage that data presented here can open a road map for future intervention strategies in SAV infection of salmon.

  20. The promise and challenge of high-throughput sequencing of the antibody repertoire

    PubMed Central

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  1. contamDE: differential expression analysis of RNA-seq data for contaminated tumor samples.

    PubMed

    Shen, Qi; Hu, Jiyuan; Jiang, Ning; Hu, Xiaohua; Luo, Zewei; Zhang, Hong

    2016-03-01

    Accurate detection of differentially expressed genes between tumor and normal samples is a primary approach of cancer-related biomarker identification. Due to the infiltration of tumor surrounding normal cells, the expression data derived from tumor samples would always be contaminated with normal cells. Ignoring such cellular contamination would deflate the power of detecting DE genes and further confound the biological interpretation of the analysis results. For the time being, there does not exists any differential expression analysis approach for RNA-seq data in literature that can properly account for the contamination of tumor samples. Without appealing to any extra information, we develop a new method 'contamDE' based on a novel statistical model that associates RNA-seq expression levels with cell types. It is demonstrated through simulation studies that contamDE could be much more powerful than the existing methods that ignore the contamination. In the application to two cancer studies, contamDE uniquely found several potential therapy and prognostic biomarkers of prostate cancer and non-small cell lung cancer. An R package contamDE is freely available at http://homepage.fudan.edu.cn/zhangh/softwares/ zhanghfd@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Dopamine Signaling Leads to Loss of Polycomb Repression and Aberrant Gene Activation in Experimental Parkinsonism

    PubMed Central

    Lerdrup, Mads; Gomes, Ana-Luisa; Kryh, Hanna; Spigolon, Giada; Caboche, Jocelyne; Fisone, Gilberto; Hansen, Klaus

    2014-01-01

    Polycomb group (PcG) proteins bind to and repress genes in embryonic stem cells through lineage commitment to the terminal differentiated state. PcG repressed genes are commonly characterized by the presence of the epigenetic histone mark H3K27me3, catalyzed by the Polycomb repressive complex 2. Here, we present in vivo evidence for a previously unrecognized plasticity of PcG-repressed genes in terminally differentiated brain neurons of parkisonian mice. We show that acute administration of the dopamine precursor, L-DOPA, induces a remarkable increase in H3K27me3S28 phosphorylation. The induction of the H3K27me3S28p histone mark specifically occurs in medium spiny neurons expressing dopamine D1 receptors and is dependent on Msk1 kinase activity and DARPP-32-mediated inhibition of protein phosphatase-1. Chromatin immunoprecipitation (ChIP) experiments showed that increased H3K27me3S28p was accompanied by reduced PcG binding to regulatory regions of genes. An analysis of the genome wide distribution of L-DOPA-induced H3K27me3S28 phosphorylation by ChIP sequencing (ChIP-seq) in combination with expression analysis by RNA-sequencing (RNA-seq) showed that the induction of H3K27me3S28p correlated with increased expression of a subset of PcG repressed genes. We found that induction of H3K27me3S28p persisted during chronic L-DOPA administration to parkisonian mice and correlated with aberrant gene expression. We propose that dopaminergic transmission can activate PcG repressed genes in the adult brain and thereby contribute to long-term maladaptive responses including the motor complications, or dyskinesia, caused by prolonged administration of L-DOPA in Parkinson's disease. PMID:25254549

  3. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas.

    PubMed

    Bao, Zhao-Shi; Chen, Hui-Min; Yang, Ming-Yu; Zhang, Chuan-Bao; Yu, Kai; Ye, Wan-Lu; Hu, Bo-Qiang; Yan, Wei; Zhang, Wei; Akers, Johnny; Ramakrishnan, Valya; Li, Jie; Carter, Bob; Liu, Yan-Wei; Hu, Hui-Min; Wang, Zheng; Li, Ming-Yang; Yao, Kun; Qiu, Xiao-Guang; Kang, Chun-Sheng; You, Yong-Ping; Fan, Xiao-Long; Song, Wei Sonya; Li, Rui-Qiang; Su, Xiao-Dong; Chen, Clark C; Jiang, Tao

    2014-11-01

    Studies of gene rearrangements and the consequent oncogenic fusion proteins have laid the foundation for targeted cancer therapy. To identify oncogenic fusions associated with glioma progression, we catalogued fusion transcripts by RNA-seq of 272 gliomas. Fusion transcripts were more frequently found in high-grade gliomas, in the classical subtype of gliomas, and in gliomas treated with radiation/temozolomide. Sixty-seven in-frame fusion transcripts were identified, including three recurrent fusion transcripts: FGFR3-TACC3, RNF213-SLC26A11, and PTPRZ1-MET (ZM). Interestingly, the ZM fusion was found only in grade III astrocytomas (1/13; 7.7%) or secondary GBMs (sGBMs, 3/20; 15.0%). In an independent cohort of sGBMs, the ZM fusion was found in three of 20 (15%) specimens. Genomic analysis revealed that the fusion arose from translocation events involving introns 3 or 8 of PTPRZ and intron 1 of MET. ZM fusion transcripts were found in GBMs irrespective of isocitrate dehydrogenase 1 (IDH1) mutation status. sGBMs harboring ZM fusion showed higher expression of genes required for PIK3CA signaling and lowered expression of genes that suppressed RB1 or TP53 function. Expression of the ZM fusion was mutually exclusive with EGFR overexpression in sGBMs. Exogenous expression of the ZM fusion in the U87MG glioblastoma line enhanced cell migration and invasion. Clinically, patients afflicted with ZM fusion harboring glioblastomas survived poorly relative to those afflicted with non-ZM-harboring sGBMs (P < 0.001). Our study profiles the shifting RNA landscape of gliomas during progression and reveled ZM as a novel, recurrent fusion transcript in sGBMs. © 2014 Bao et al.; Published by Cold Spring Harbor Laboratory Press.

  4. RNA-seq of 272 gliomas revealed a novel, recurrent PTPRZ1-MET fusion transcript in secondary glioblastomas

    PubMed Central

    Bao, Zhao-Shi; Yang, Ming-Yu; Zhang, Chuan-Bao; Yu, Kai; Ye, Wan-Lu; Hu, Bo-Qiang; Yan, Wei; Zhang, Wei; Akers, Johnny; Ramakrishnan, Valya; Li, Jie; Carter, Bob; Liu, Yan-Wei; Hu, Hui-Min; Wang, Zheng; Li, Ming-Yang; Yao, Kun; Qiu, Xiao-Guang; Kang, Chun-Sheng; You, Yong-Ping; Fan, Xiao-Long; Song, Wei Sonya; Li, Rui-Qiang

    2014-01-01

    Studies of gene rearrangements and the consequent oncogenic fusion proteins have laid the foundation for targeted cancer therapy. To identify oncogenic fusions associated with glioma progression, we catalogued fusion transcripts by RNA-seq of 272 gliomas. Fusion transcripts were more frequently found in high-grade gliomas, in the classical subtype of gliomas, and in gliomas treated with radiation/temozolomide. Sixty-seven in-frame fusion transcripts were identified, including three recurrent fusion transcripts: FGFR3-TACC3, RNF213-SLC26A11, and PTPRZ1-MET (ZM). Interestingly, the ZM fusion was found only in grade III astrocytomas (1/13; 7.7%) or secondary GBMs (sGBMs, 3/20; 15.0%). In an independent cohort of sGBMs, the ZM fusion was found in three of 20 (15%) specimens. Genomic analysis revealed that the fusion arose from translocation events involving introns 3 or 8 of PTPRZ and intron 1 of MET. ZM fusion transcripts were found in GBMs irrespective of isocitrate dehydrogenase 1 (IDH1) mutation status. sGBMs harboring ZM fusion showed higher expression of genes required for PIK3CA signaling and lowered expression of genes that suppressed RB1 or TP53 function. Expression of the ZM fusion was mutually exclusive with EGFR overexpression in sGBMs. Exogenous expression of the ZM fusion in the U87MG glioblastoma line enhanced cell migration and invasion. Clinically, patients afflicted with ZM fusion harboring glioblastomas survived poorly relative to those afflicted with non-ZM-harboring sGBMs (P < 0.001). Our study profiles the shifting RNA landscape of gliomas during progression and reveled ZM as a novel, recurrent fusion transcript in sGBMs. PMID:25135958

  5. A Comparison of RNA-Seq Results from Paired Formalin-Fixed Paraffin-Embedded and Fresh-Frozen Glioblastoma Tissue Samples

    PubMed Central

    Esteve-Codina, Anna; Arpi, Oriol; Martinez-García, Maria; Pineda, Estela; Mallo, Mar; Gut, Marta; Carrato, Cristina; Rovira, Anna; Lopez, Raquel; Tortosa, Avelina; Dabad, Marc; Del Barco, Sonia; Heath, Simon; Bagué, Silvia; Ribalta, Teresa; Alameda, Francesc; de la Iglesia, Nuria

    2017-01-01

    The molecular classification of glioblastoma (GBM) based on gene expression might better explain outcome and response to treatment than clinical factors. Whole transcriptome sequencing using next-generation sequencing platforms is rapidly becoming accepted as a tool for measuring gene expression for both research and clinical use. Fresh frozen (FF) tissue specimens of GBM are difficult to obtain since tumor tissue obtained at surgery is often scarce and necrotic and diagnosis is prioritized over freezing. After diagnosis, leftover tissue is usually stored as formalin-fixed paraffin-embedded (FFPE) tissue. However, RNA from FFPE tissues is usually degraded, which could hamper gene expression analysis. We compared RNA-Seq data obtained from matched pairs of FF and FFPE GBM specimens. Only three FFPE out of eleven FFPE-FF matched samples yielded informative results. Several quality-control measurements showed that RNA from FFPE samples was highly degraded but maintained transcriptomic similarities to RNA from FF samples. Certain issues regarding mutation analysis and subtype prediction were detected. Nevertheless, our results suggest that RNA-Seq of FFPE GBM specimens provides reliable gene expression data that can be used in molecular studies of GBM if the RNA is sufficiently preserved. PMID:28122052

  6. Purification of nanogram-range immunoprecipitated DNA in ChIP-seq application.

    PubMed

    Zhong, Jian; Ye, Zhenqing; Lenz, Samuel W; Clark, Chad R; Bharucha, Adil; Farrugia, Gianrico; Robertson, Keith D; Zhang, Zhiguo; Ordog, Tamas; Lee, Jeong-Heon

    2017-12-21

    Chromatin immunoprecipitation-sequencing (ChIP-seq) is a widely used epigenetic approach for investigating genome-wide protein-DNA interactions in cells and tissues. The approach has been relatively well established but several key steps still require further improvement. As a part of the procedure, immnoprecipitated DNA must undergo purification and library preparation for subsequent high-throughput sequencing. Current ChIP protocols typically yield nanogram quantities of immunoprecipitated DNA mainly depending on the target of interest and starting chromatin input amount. However, little information exists on the performance of reagents used for the purification of such minute amounts of immunoprecipitated DNA in ChIP elution buffer and their effects on ChIP-seq data. Here, we compared DNA recovery, library preparation efficiency, and ChIP-seq results obtained with several commercial DNA purification reagents applied to 1 ng ChIP DNA and also investigated the impact of conditions under which ChIP DNA is stored. We compared DNA recovery of ten commercial DNA purification reagents and phenol/chloroform extraction from 1 to 50 ng of immunopreciptated DNA in ChIP elution buffer. The recovery yield was significantly different with 1 ng of DNA while similar in higher DNA amounts. We also observed that the low nanogram range of purified DNA is prone to loss during storage depending on the type of polypropylene tube used. The immunoprecipitated DNA equivalent to 1 ng of purified DNA was subject to DNA purification and library preparation to evaluate the performance of four better performing purification reagents in ChIP-seq applications. Quantification of library DNAs indicated the selected purification kits have a negligible impact on the efficiency of library preparation. The resulting ChIP-seq data were comparable with the dataset generated by ENCODE consortium and were highly correlated between the data from different purification reagents. This study provides comparative data on commercial DNA purification reagents applied to nanogram-range immunopreciptated ChIP DNA and evidence for the importance of storage conditions of low nanogram-range purified DNA. We verified consistent high performance of a subset of the tested reagents. These results will facilitate the improvement of ChIP-seq methodology for low-input applications.

  7. Transcriptome-wide analysis supports environmental adaptations of two Pinus pinaster populations from contrasting habitats.

    PubMed

    Cañas, Rafael A; Feito, Isabel; Fuente-Maqueda, José Francisco; Ávila, Concepción; Majada, Juan; Cánovas, Francisco M

    2015-11-06

    Maritime pine (Pinus pinaster Aiton) grows in a range of different climates in the southwestern Mediterranean region and the existence of a variety of latitudinal ecotypes or provenances is well established. In this study, we have conducted a deep analysis of the transcriptome in needles from two P. pinaster provenances, Leiria (Portugal) and Tamrabta (Morocco), which were grown in northern Spain under the same conditions. An oligonucleotide microarray (PINARRAY3) and RNA-Seq were used for whole-transcriptome analyses, and we found that 90.95% of the data were concordant between the two platforms. Furthermore, the two methods identified very similar percentages of differentially expressed genes with values of 5.5% for PINARRAY3 and 5.7% for RNA-Seq. In total, 6,023 transcripts were shared and 88 differentially expressed genes overlapped in the two platforms. Among the differentially expressed genes, all transport related genes except aquaporins were expressed at higher levels in Tamrabta than in Leiria. In contrast, genes involved in secondary metabolism were expressed at higher levels in Tamrabta, and photosynthesis-related genes were expressed more highly in Leiria. The genes involved in light sensing in plants were well represented in the differentially expressed groups of genes. In addition, increased levels of hormones such as abscisic acid, gibberellins, jasmonic and salicylic acid were observed in Leiria. Both transcriptome platforms have proven to be useful resources, showing complementary and reliable results. The results presented here highlight the different abilities of the two maritime pine populations to sense environmental conditions and reveal one type of regulation that can be ascribed to different genetic and epigenetic backgrounds.

  8. RNA-Sequencing Gene Expression Profiling of Orbital Adipose-Derived Stem Cell Population Implicate HOX Genes and WNT Signaling Dysregulation in the Pathogenesis of Thyroid-Associated Orbitopathy.

    PubMed

    Tao, Wensi; Ayala-Haedo, Juan A; Field, Matthew G; Pelaez, Daniel; Wester, Sara T

    2017-12-01

    The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell-specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways.

  9. RNA-Sequencing Gene Expression Profiling of Orbital Adipose-Derived Stem Cell Population Implicate HOX Genes and WNT Signaling Dysregulation in the Pathogenesis of Thyroid-Associated Orbitopathy

    PubMed Central

    Tao, Wensi; Ayala-Haedo, Juan A.; Field, Matthew G.; Pelaez, Daniel; Wester, Sara T.

    2017-01-01

    Purpose The purpose of this study was to characterize the intrinsic cellular properties of orbital adipose-derived stem cells (OASC) from patients with thyroid-associated orbitopathy (TAO) and healthy controls. Methods Orbital adipose tissue was collected from a total of nine patients: four controls and five patients with TAO. Isolated OASC were characterized with mesenchymal stem cell–specific markers. Orbital adipose-derived stem cells were differentiated into three lineages: chondrocytes, osteocytes, and adipocytes. Reverse transcription PCR of genes involved in the adipogenesis, chondrogenesis, and osteogenesis pathways were selected to assay the differentiation capacities. RNA sequencing analysis (RNA-seq) was performed and results were compared to assess for differences in gene expression between TAO and controls. Selected top-ranked results were confirmed by RT-PCR. Results Orbital adipose-derived stem cells isolated from orbital fat expressed high levels of mesenchymal stem cell markers, but low levels of the pluripotent stem cell markers. Orbital adipose-derived stem cells isolated from TAO patients exhibited an increase in adipogenesis, and a decrease in chondrogenesis and osteogenesis. RNA-seq disclosed 54 differentially expressed genes. In TAO OASC, expression of early neural crest progenitor marker (WNT signaling, ZIC genes and MSX2) was lost. Meanwhile, ectopic expression of HOXB2 and HOXB3 was found in the OASC from TAO. Conclusion Our results suggest that there are intrinsic genetic and cellular differences in the OASC populations derived from TAO patients. The upregulation in adipogenesis in OASC of TAO may be is consistent with the clinical phenotype. Downregulation of early neural crest markers and ectopic expression of HOXB2 and HOXB3 in TAO OASC demonstrate dysregulation of developmental and tissue patterning pathways. PMID:29214313

  10. Comparative omics and feeding manipulations in chicken indicate a shift of the endocrine role of visceral fat towards reproduction.

    PubMed

    Bornelöv, Susanne; Seroussi, Eyal; Yosefi, Sara; Benjamini, Sharon; Miyara, Shoval; Ruzal, Mark; Grabherr, Manfred; Rafati, Nima; Molin, Anna-Maja; Pendavis, Ken; Burgess, Shane C; Andersson, Leif; Friedman-Einat, Miriam

    2018-04-26

    The mammalian adipose tissue plays a central role in energy-balance control, whereas the avian visceral fat hardly expresses leptin, the key adipokine in mammals. Therefore, to assess the endocrine role of adipose tissue in birds, we compared the transcriptome and proteome between two metabolically different types of chickens, broilers and layers, bred towards efficient meat and egg production, respectively. Broilers and layer hens, grown up to sexual maturation under free-feeding conditions, differed 4.0-fold in weight and 1.6-fold in ovarian-follicle counts, yet the relative accumulation of visceral fat was comparable. RNA-seq and mass-spectrometry (MS) analyses of visceral fat revealed differentially expressed genes between broilers and layers, 1106 at the mRNA level (FDR ≤ 0.05), and 203 at the protein level (P ≤ 0.05). In broilers, Ingenuity Pathway Analysis revealed activation of the PTEN-pathway, and in layers increased response to external signals. The expression pattern of genes encoding fat-secreted proteins in broilers and layers was characterized in the RNA-seq and MS data, as well as by qPCR on visceral fat under free feeding and 24 h-feed deprivation. This characterization was expanded using available RNA-seq data of tissues from red junglefowl, and of visceral fat from broilers of different types. These comparisons revealed expression of new adipokines and secreted proteins (LCAT, LECT2, SERPINE2, SFTP1, ZP1, ZP3, APOV1, VTG1 and VTG2) at the mRNA and/or protein levels, with dynamic gene expression patterns in the selected chicken lines (except for ZP1; FDR/P ≤ 0.05) and feed deprivation (NAMPT, SFTPA1 and ZP3) (P ≤ 0.05). In contrast, some of the most prominent adipokines in mammals, leptin, TNF, IFNG, and IL6 were expressed at a low level (FPKM/RPKM< 1) and did not show differential mRNA expression neither between broiler and layer lines nor between fed vs. feed-deprived chickens. Our study revealed that RNA and protein expression in visceral fat changes with selective breeding, suggesting endocrine roles of visceral fat in the selected phenotypes. In comparison to gene expression in visceral fat of mammals, our findings points to a more direct cross talk of the chicken visceral fat with the reproductive system and lower involvement in the regulation of appetite, inflammation and insulin resistance.

  11. A human haploid gene trap collection to study lncRNAs with unusual RNA biology.

    PubMed

    Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M

    2016-01-01

    Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.

  12. RNA-seq identifies a role for the PPARβ/δ inverse agonist GSK0660 in the regulation of TNFα-induced cytokine signaling in retinal endothelial cells

    PubMed Central

    Savage, Sara R.; McCollum, Gary W.; Yang, Rong

    2015-01-01

    Purpose The peroxisome proliferator-activated receptor beta/delta (PPARβ/δ) is a transcription factor with roles in metabolism, angiogenesis, and inflammation. It has yet undefined roles in retinal inflammation and diabetic retinopathy (DR). We used RNA-seq to better understand the role of the antagonist and inverse agonist of PPARβ/δ, GSK0660, in TNFα-induced inflammation. Understanding the underlying mechanisms of vascular inflammation could lead to new treatments for DR. Methods RNA was isolated from human retinal microvascular endothelial cells treated with a vehicle, TNFα, or TNFα plus GSK0660. RNA-seq was performed with a 50 bp single read protocol. The differential expression was determined using edgeR and gene ontology, and a pathway analysis was performed using DAVID. RNA-seq validation was performed using qRT-PCR using the primers for ANGPTL4, CCL8, NOV, CXCL10, and PDPK1. Results TNFα differentially regulated 1,830 transcripts, many of which are involved in the cytokine–cytokine receptor interaction, chemokine signaling, and inflammatory response. Additionally, TNFα highly upregulated genes involved in leukocyte recruitment, including CCL5, CX3CL1, and CXCL10. GSK0660 differentially regulated 273 transcripts in TNFα-treated cells compared to TNFα alone. A pathway analysis revealed the enrichment of cytokine–cytokine receptor signaling. In particular, GSK0660 blocks the TNFα-induced upregulation of CCL8, a chemokine involved in leukocyte recruitment. Conclusions TNFα regulates several genes related to retinal leukostasis in retinal endothelial cells. GSK0660 blocks the effect of TNFα on the expressions of cytokines involved in leukocyte recruitment, including CCL8, CCL17, and CXCL10 and it may therefore block TNFα-induced retinal leukostasis. PMID:26015769

  13. Predicting survival times for neuroblastoma patients using RNA-seq expression profiles.

    PubMed

    Grimes, Tyler; Walker, Alejandro R; Datta, Susmita; Datta, Somnath

    2018-05-30

    Neuroblastoma is the most common tumor of early childhood and is notorious for its high variability in clinical presentation. Accurate prognosis has remained a challenge for many patients. In this study, expression profiles from RNA-sequencing are used to predict survival times directly. Several models are investigated using various annotation levels of expression profiles (genes, transcripts, and introns), and an ensemble predictor is proposed as a heuristic for combining these different profiles. The use of RNA-seq data is shown to improve accuracy in comparison to using clinical data alone for predicting overall survival times. Furthermore, clinically high-risk patients can be subclassified based on their predicted overall survival times. In this effort, the best performing model was the elastic net using both transcripts and introns together. This model separated patients into two groups with 2-year overall survival rates of 0.40±0.11 (n=22) versus 0.80±0.05 (n=68). The ensemble approach gave similar results, with groups 0.42±0.10 (n=25) versus 0.82±0.05 (n=65). This suggests that the ensemble is able to effectively combine the individual RNA-seq datasets. Using predicted survival times based on RNA-seq data can provide improved prognosis by subclassifying clinically high-risk neuroblastoma patients. This article was reviewed by Subharup Guha and Isabel Nepomuceno.

  14. Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

    PubMed Central

    Rahmatallah, Yasir; Emmert-Streib, Frank

    2016-01-01

    Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128

  15. Application of D-Crustacean Hyperglycemic Hormone Induces Peptidases Transcription and Suppresses Glycolysis-Related Transcripts in the Hepatopancreas of the Crayfish Pontastacus leptodactylus — Results of a Transcriptomic Study

    PubMed Central

    De Moro, Gianluca; Gerdol, Marco; Guarnaccia, Corrado; Mosco, Alessandro; Pallavicini, Alberto; Giulianini, Piero Giulio

    2013-01-01

    The crustacean Hyperglycemic Hormone (cHH) is a neuropeptide present in many decapods. Two different chiral isomers are simultaneously present in Astacid crayfish and their specific biological functions are still poorly understood. The present study is aimed at better understanding the potentially different effect of each of the isomers on the hepatopancreatic gene expression profile in the crayfish Pontastacus leptodactylus, in the context of short term hyperglycemia. Hence, two different chemically synthesized cHH enantiomers, containing either L- or D-Phe3, were injected to the circulation of intermolt females following removal of their X organ-Sinus gland complex. The effects triggered by the injection of the two alternate isomers were detected after one hour through measurement of circulating glucose levels. Triggered changes of the transcriptome expression profile in the hepatopancreas were analyzed by RNA-seq. A whole transcriptome shotgun sequence assembly provided the assumedly complete transcriptome of P. leptodactylus hepatopancreas, followed by RNA-seq analysis of changes in the expression level of many genes caused by the application of each of the hormone isomers. Circulating glucose levels were much higher in response to the D-isoform than to the L-isoform injection, one hour from injection. Similarly, the RNA-seq analysis confirmed a stronger effect on gene expression following the administration of D-cHH, while just limited alterations were caused by the L-isomer. These findings demonstrated a more prominent short term effect of the D-cHH on the transcription profile and shed light on the effect of the D-isomer on specific functional gene groups. Another contribution of the study is the construction of a de novo assembly of the hepatopancreas transcriptome, consisting of 39,935 contigs, that dramatically increases the molecular information available for this species and for crustaceans in general, providing an efficient tool for studying gene expression patterns in this organ. PMID:23840318

  16. Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

    PubMed

    Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

    2017-02-06

    Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.

  17. TranslatomeDB: a comprehensive database and cloud-based analysis platform for translatome sequencing data.

    PubMed

    Liu, Wanting; Xiang, Lunping; Zheng, Tingkai; Jin, Jingjie; Zhang, Gong

    2018-01-04

    Translation is a key regulatory step, linking transcriptome and proteome. Two major methods of translatome investigations are RNC-seq (sequencing of translating mRNA) and Ribo-seq (ribosome profiling). To facilitate the investigation of translation, we built a comprehensive database TranslatomeDB (http://www.translatomedb.net/) which provides collection and integrated analysis of published and user-generated translatome sequencing data. The current version includes 2453 Ribo-seq, 10 RNC-seq and their 1394 corresponding mRNA-seq datasets in 13 species. The database emphasizes the analysis functions in addition to the dataset collections. Differential gene expression (DGE) analysis can be performed between any two datasets of same species and type, both on transcriptome and translatome levels. The translation indices translation ratios, elongation velocity index and translational efficiency can be calculated to quantitatively evaluate translational initiation efficiency and elongation velocity, respectively. All datasets were analyzed using a unified, robust, accurate and experimentally-verifiable pipeline based on the FANSe3 mapping algorithm and edgeR for DGE analyzes. TranslatomeDB also allows users to upload their own datasets and utilize the identical unified pipeline to analyze their data. We believe that our TranslatomeDB is a comprehensive platform and knowledgebase on translatome and proteome research, releasing the biologists from complex searching, analyzing and comparing huge sequencing data without needing local computational power. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    PubMed

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  19. NetMiner-an ensemble pipeline for building genome-wide and high-quality gene co-expression network using massive-scale RNA-seq samples.

    PubMed

    Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei

    2018-01-01

    Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.

  20. RNA-seq Transcriptional Profiling of an Arbuscular Mycorrhiza Provides Insights into Regulated and Coordinated Gene Expression in Lotus japonicus and Rhizophagus irregularis.

    PubMed

    Handa, Yoshihiro; Nishide, Hiroyo; Takeda, Naoya; Suzuki, Yutaka; Kawaguchi, Masayoshi; Saito, Katsuharu

    2015-08-01

    Gene expression during arbuscular mycorrhizal development is highly orchestrated in both plants and arbuscular mycorrhizal fungi. To elucidate the gene expression profiles of the symbiotic association, we performed a digital gene expression analysis of Lotus japonicus and Rhizophagus irregularis using a HiSeq 2000 next-generation sequencer with a Cufflinks assembly and de novo transcriptome assembly. There were 3,641 genes differentially expressed during arbuscular mycorrhizal development in L. japonicus, approximately 80% of which were up-regulated. The up-regulated genes included secreted proteins, transporters, proteins involved in lipid and amino acid metabolism, ribosomes and histones. We also detected many genes that were differentially expressed in small-secreted peptides and transcription factors, which may be involved in signal transduction or transcription regulation during symbiosis. Co-regulated genes between arbuscular mycorrhizal and root nodule symbiosis were not particularly abundant, but transcripts encoding for membrane traffic-related proteins, transporters and iron transport-related proteins were found to be highly co-up-regulated. In transcripts of arbuscular mycorrhizal fungi, expansion of cytochrome P450 was observed, which may contribute to various metabolic pathways required to accommodate roots and soil. The comprehensive gene expression data of both plants and arbuscular mycorrhizal fungi provide a powerful platform for investigating the functional and molecular mechanisms underlying arbuscular mycorrhizal symbiosis. © The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  1. Determination of differential gene expression profiles in superficial and deeper zones of mature rat articular cartilage using RNA sequencing of laser microdissected tissue specimens.

    PubMed

    Mori, Yoshifumi; Chung, Ung-Il; Tanaka, Sakae; Saito, Taku

    2014-01-01

    Superficial zone (SFZ) cells, which are morphologically and functionally distinct from chondrocytes in deeper zones, play important roles in the maintenance of articular cartilage. Here, we established an easy and reliable method for performance of laser microdissection (LMD) on cryosections of mature rat articular cartilage using an adhesive membrane. We further examined gene expression profiles in the SFZ and the deeper zones of articular cartilage by performing RNA sequencing (RNA-seq). We validated sample collection methods, RNA amplification and the RNA-seq data using real-time RT-PCR. The combined data provide comprehensive information regarding genes specifically expressed in the SFZ or deeper zones, as well as a useful protocol for expression analysis of microsamples of hard tissues.

  2. Comparison of hepatic NRF2 and AHR binding in 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) treated mice demonstrates NRF2-independent PKM2 induction.

    PubMed

    Nault, Rance; Doskey, Claire M; Fader, Kelly A; Rockwell, Cheryl E; Zacharewski, Timothy R

    2018-05-11

    2,3,7,8-Tetrachlorodibenzo- p -dioxin (TCDD) induces hepatic oxidative stress following activation of the aryl hydrocarbon receptor (AhR). Our recent studies showed TCDD induced pyruvate kinase muscle isoform 2 ( Pkm2 ) as a novel antioxidant response in normal differentiated hepatocytes. To investigate cooperative regulation between nuclear factor, erythroid derived 2, like 2 ( Nrf2 ) and the AhR in the induction of Pkm2 , hepatic ChIP-seq analyses were integrated with RNA-seq time course data from mice treated with TCDD for 2 - 168h. ChIP-seq analysis 2h after TCDD treatment identified genome-wide NRF2 enrichment. Approximately 842 NRF2 enriched regions were located in the regulatory region of differentially expressed genes (DEGs) while 579 DEGs showed both NRF2 and AhR enrichment. Sequence analysis of regions with overlapping NRF2 and AhR enrichment showed over-representation of either antioxidant or dioxin response elements (ARE and DRE, respectively), although 18 possessed both motifs. NRF2 exhibited negligible enrichment within a closed Pkm chromatin region while the AhR was enriched 29-fold. Furthermore, TCDD induced Pkm2 in primary hepatocytes from wild-type and Nrf2 null mice, indicating NRF2 is not required. Although NRF2 and AhR cooperate to regulate numerous antioxidant gene expression responses, the induction of Pkm2 by TCDD is independent of ROS-mediated NRF2 activation. The American Society for Pharmacology and Experimental Therapeutics.

  3. Authentic Research Experience and "Big Data" Analysis in the Classroom: Maize Response to Abiotic Stress.

    PubMed

    Makarevitch, Irina; Frechette, Cameo; Wiatros, Natalia

    2015-01-01

    Integration of inquiry-based approaches into curriculum is transforming the way science is taught and studied in undergraduate classrooms. Incorporating quantitative reasoning and mathematical skills into authentic biology undergraduate research projects has been shown to benefit students in developing various skills necessary for future scientists and to attract students to science, technology, engineering, and mathematics disciplines. While large-scale data analysis became an essential part of modern biological research, students have few opportunities to engage in analysis of large biological data sets. RNA-seq analysis, a tool that allows precise measurement of the level of gene expression for all genes in a genome, revolutionized molecular biology and provides ample opportunities for engaging students in authentic research. We developed, implemented, and assessed a series of authentic research laboratory exercises incorporating a large data RNA-seq analysis into an introductory undergraduate classroom. Our laboratory series is focused on analyzing gene expression changes in response to abiotic stress in maize seedlings; however, it could be easily adapted to the analysis of any other biological system with available RNA-seq data. Objective and subjective assessment of student learning demonstrated gains in understanding important biological concepts and in skills related to the process of science. © 2015 I. Makarevitch et al. CBE—Life Sciences Education © 2015 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).

  4. C. elegans ADARs antagonize silencing of cellular dsRNAs by the antiviral RNAi pathway.

    PubMed

    Reich, Daniel P; Tyc, Katarzyna M; Bass, Brenda L

    2018-02-01

    Cellular dsRNAs are edited by adenosine deaminases that act on RNA (ADARs). While editing can alter mRNA-coding potential, most editing occurs in noncoding sequences, the function of which is poorly understood. Using dsRNA immunoprecipitation (dsRIP) and RNA sequencing (RNA-seq), we identified 1523 regions of clustered A-to-I editing, termed editing-enriched regions (EERs), in four stages of Caenorhabditis elegans development, often with highest expression in embryos. Analyses of small RNA-seq data revealed 22- to 23-nucleotide (nt) siRNAs, reminiscent of viral siRNAs, that mapped to EERs and were abundant in adr-1;adr-2 mutant animals. Consistent with roles for these siRNAs in silencing, EER-associated genes (EAGs) were down-regulated in adr-1;adr-2 embryos, and this was dependent on associated EERs and the RNAi factor RDE-4. We observed that ADARs genetically interact with the 26G endogenous siRNA (endo-siRNA) pathway, which likely competes for RNAi components; deletion of factors required for this pathway ( rrf-3 or ergo-1 ) in adr-1;adr-2 mutant strains caused a synthetic phenotype that was rescued by deleting antiviral RNAi factors. Poly(A) + RNA-seq revealed EAG down-regulation and antiviral gene induction in adr-1;adr-2;rrf-3 embryos, and these expression changes were dependent on rde-1 and rde-4 Our data suggest that ADARs restrict antiviral silencing of cellular dsRNAs. © 2018 Reich et al.; Published by Cold Spring Harbor Laboratory Press.

  5. RNA sequencing of chorionic villi from recurrent pregnancy loss patients reveals impaired function of basic nuclear and cellular machinery

    PubMed Central

    Sõber, Siim; Rull, Kristiina; Reiman, Mario; Ilisson, Piret; Mattila, Pirkko; Laan, Maris

    2016-01-01

    Recurrent pregnancy loss (RPL) concerns ~3% of couples aiming at childbirth. In the current study, transcriptomes and miRNomes of 1st trimester placental chorionic villi were analysed for 2 RPL cases (≥6 miscarriages) and normal, but electively terminated pregnancies (ETP; n = 8). Sequencing was performed on Illumina HiSeq 2000 platform. Differential expression analyses detected 51 (27%) transcripts with increased and 138 (73%) with decreased expression in RPL compared to ETP (DESeq: FDR P < 0.1 and DESeq2: <0.05). RPL samples had substantially decreased transcript levels of histones, regulatory RNAs and genes involved in telomere, spliceosome, ribosomal, mitochondrial and intra-cellular signalling functions. Downregulated expression of HIST1H1B and HIST1H4A (Wilcoxon test, fc≤0.372, P≤9.37 × 10−4) was validated in an extended sample by quantitative PCR (RPL, n = 14; ETP, n = 24). Several upregulated genes are linked to placental function and pregnancy complications: ATF4, C3, PHLDA2, GPX4, ICAM1, SLC16A2. Analysis of the miRNA-Seq dataset identified no large disturbances in RPL samples. Notably, nearly 2/3 of differentially expressed genes have binding sites for E2F transcription factors, coordinating mammalian endocycle and placental development. For a conceptus destined to miscarriage, the E2F TF-family represents a potential key coordinator in reprogramming the placental genome towards gradually stopping the maintenance of basic nuclear and cellular functions. PMID:27929073

  6. RNA-Seq Mouse Brain Regions Expression Data Analysis: Focus on ApoE Functional Network

    PubMed

    Babenko, Vladimir N; Smagin, Dmitry A; Kudryavtseva, Natalia N

    2017-09-13

    ApoE expression status was proved to be a highly specific marker of energy metabolism rate in the brain. Along with its neighbor, Translocase of Outer Mitochondrial Membrane 40 kDa (TOMM40) which is involved in mitochondrial metabolism, the corresponding genomic region constitutes the neuroenergetic hotspot. Using RNA-Seq data from a murine model of chronic stress a significant positive expression coordination of seven neighboring genes in ApoE locus in five brain regions was observed. ApoE maintains one of the highest absolute expression values genome-wide, implying that ApoE can be the driver of the neighboring gene expression alteration observed under stressful loads. Notably, we revealed the highly statistically significant increase of ApoE expression in the hypothalamus of chronically aggressive (FDR < 0.007) and defeated (FDR < 0.001) mice compared to the control. Correlation analysis revealed a close association of ApoE and proopiomelanocortin (Pomc) gene expression profiles implying the putative neuroendocrine stress response background of ApoE expression elevation therein.

  7. DBATE: database of alternative transcripts expression.

    PubMed

    Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

    2013-01-01

    The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.

  8. Hierarchical interactions between Fnr orthologs allows fine-tuning of transcription in response to oxygen in Herbaspirillum seropedicae

    PubMed Central

    Batista, Marcelo Bueno; Chandra, Govind; Monteiro, Rose Adele; de Souza, Emanuel Maltempi; Dixon, Ray

    2018-01-01

    Abstract Bacteria adjust the composition of their electron transport chain (ETC) to efficiently adapt to oxygen gradients. This involves differential expression of various ETC components to optimize energy generation. In Herbaspirillum seropedicae, reprogramming of gene expression in response to oxygen availability is controlled at the transcriptional level by three Fnr orthologs. Here, we characterised Fnr regulons using a combination of RNA-Seq and ChIP-Seq analysis. We found that Fnr1 and Fnr3 directly regulate discrete groups of promoters (Groups I and II, respectively), and that a third group (Group III) is co-regulated by both transcription factors. Comparison of DNA binding motifs between the three promoter groups suggests Group III promoters are potentially co-activated by Fnr3–Fnr1 heterodimers. Specific interaction between Fnr1 and Fnr3, detected in two-hybrid assays, was dependent on conserved residues in their dimerization interfaces, indicative of heterodimer formation in vivo. The requirements for co-activation of the fnr1 promoter, belonging to Group III, suggest either sequential activation by Fnr3 and Fnr1 homodimers or the involvement of Fnr3–Fnr1 heterodimers. Analysis of Fnr proteins with swapped activation domains provides evidence that co-activation by Fnr1 and Fnr3 at Group III promoters optimises interactions with RNA polymerase to fine-tune transcription in response to prevailing oxygen concentrations. PMID:29529262

  9. Hierarchical interactions between Fnr orthologs allows fine-tuning of transcription in response to oxygen in Herbaspirillum seropedicae.

    PubMed

    Batista, Marcelo Bueno; Chandra, Govind; Monteiro, Rose Adele; de Souza, Emanuel Maltempi; Dixon, Ray

    2018-05-04

    Bacteria adjust the composition of their electron transport chain (ETC) to efficiently adapt to oxygen gradients. This involves differential expression of various ETC components to optimize energy generation. In Herbaspirillum seropedicae, reprogramming of gene expression in response to oxygen availability is controlled at the transcriptional level by three Fnr orthologs. Here, we characterised Fnr regulons using a combination of RNA-Seq and ChIP-Seq analysis. We found that Fnr1 and Fnr3 directly regulate discrete groups of promoters (Groups I and II, respectively), and that a third group (Group III) is co-regulated by both transcription factors. Comparison of DNA binding motifs between the three promoter groups suggests Group III promoters are potentially co-activated by Fnr3-Fnr1 heterodimers. Specific interaction between Fnr1 and Fnr3, detected in two-hybrid assays, was dependent on conserved residues in their dimerization interfaces, indicative of heterodimer formation in vivo. The requirements for co-activation of the fnr1 promoter, belonging to Group III, suggest either sequential activation by Fnr3 and Fnr1 homodimers or the involvement of Fnr3-Fnr1 heterodimers. Analysis of Fnr proteins with swapped activation domains provides evidence that co-activation by Fnr1 and Fnr3 at Group III promoters optimises interactions with RNA polymerase to fine-tune transcription in response to prevailing oxygen concentrations.

  10. RNA SEQ Analysis Indicates that the AE3 Cl-/HCO3- Exchanger Contributes to Active Transport-Mediated CO2 Disposal in Heart.

    PubMed

    Vairamani, Kanimozhi; Wang, Hong-Sheng; Medvedovic, Mario; Lorenz, John N; Shull, Gary E

    2017-08-04

    Loss of the AE3 Cl - /HCO 3 - exchanger (Slc4a3) in mice causes an impaired cardiac force-frequency response and heart failure under some conditions but the mechanisms are not known. To better understand the functions of AE3, we performed RNA Seq analysis of AE3-null and wild-type mouse hearts and evaluated the data with respect to three hypotheses (CO 2 disposal, facilitation of Na + -loading, and recovery from an alkaline load) that have been proposed for its physiological functions. Gene Ontology and PubMatrix analyses of differentially expressed genes revealed a hypoxia response and changes in vasodilation and angiogenesis genes that strongly support the CO 2 disposal hypothesis. Differential expression of energy metabolism genes, which indicated increased glucose utilization and decreased fatty acid utilization, were consistent with adaptive responses to perturbations of O 2 /CO 2 balance in AE3-null myocytes. Given that the myocardium is an obligate aerobic tissue and consumes large amounts of O 2 , the data suggest that loss of AE3, which has the potential to extrude CO 2 in the form of HCO 3 - , impairs O 2 /CO 2 balance in cardiac myocytes. These results support a model in which the AE3 Cl - /HCO 3 - exchanger, coupled with parallel Cl - and H + -extrusion mechanisms and extracellular carbonic anhydrase, is responsible for active transport-mediated disposal of CO 2 .

  11. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets

    PubMed Central

    Macosko, Evan Z.; Basu, Anindita; Satija, Rahul; Nemesh, James; Shekhar, Karthik; Goldman, Melissa; Tirosh, Itay; Bialas, Allison R.; Kamitaki, Nolan; Martersteck, Emily M.; Trombetta, John J.; Weitz, David A.; Sanes, Joshua R.; Shalek, Alex K.; Regev, Aviv; McCarroll, Steven A.

    2015-01-01

    Summary Cells, the basic units of biological structure and function, vary broadly in type and state. Single-cell genomics can characterize cell identity and function, but limitations of ease and scale have prevented its broad application. Here we describe Drop-Seq, a strategy for quickly profiling thousands of individual cells by separating them into nanoliter-sized aqueous droplets, associating a different barcode with each cell’s RNAs, and sequencing them all together. Drop-Seq analyzes mRNA transcripts from thousands of individual cells simultaneously while remembering transcripts’ cell of origin. We analyzed transcriptomes from 44,808 mouse retinal cells and identified 39 transcriptionally distinct cell populations, creating a molecular atlas of gene expression for known retinal cell classes and novel candidate cell subtypes. Drop-Seq will accelerate biological discovery by enabling routine transcriptional profiling at single-cell resolution. PMID:26000488

  12. Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

    PubMed

    Davidson, Nadia M; Oshlack, Alicia

    2018-05-01

    RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.

  13. IsoSCM: improved and alternative 3′ UTR annotation using multiple change-point inference

    PubMed Central

    Shenker, Sol; Miura, Pedro; Sanfilippo, Piero

    2015-01-01

    Major applications of RNA-seq data include studies of how the transcriptome is modulated at the levels of gene expression and RNA processing, and how these events are related to cellular identity, environmental condition, and/or disease status. While many excellent tools have been developed to analyze RNA-seq data, these generally have limited efficacy for annotating 3′ UTRs. Existing assembly strategies often fragment long 3′ UTRs, and importantly, none of the algorithms in popular use can apportion data into tandem 3′ UTR isoforms, which are frequently generated by alternative cleavage and polyadenylation (APA). Consequently, it is often not possible to identify patterns of differential APA using existing assembly tools. To address these limitations, we present a new method for transcript assembly, Isoform Structural Change Model (IsoSCM) that incorporates change-point analysis to improve the 3′ UTR annotation process. Through evaluation on simulated and genuine data sets, we demonstrate that IsoSCM annotates 3′ termini with higher sensitivity and specificity than can be achieved with existing methods. We highlight the utility of IsoSCM by demonstrating its ability to recover known patterns of tissue-regulated APA. IsoSCM will facilitate future efforts for 3′ UTR annotation and genome-wide studies of the breadth, regulation, and roles of APA leveraging RNA-seq data. The IsoSCM software and source code are available from our website https://github.com/shenkers/isoscm. PMID:25406361

  14. A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.

    PubMed

    Choi, Yoonha; Coram, Marc; Peng, Jie; Tang, Hua

    2017-07-01

    Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.

  15. Gene expression analysis of whole blood RNA from pigs infected with low and high pathogenic African swine fever viruses

    DOE PAGES

    Jaing, Crystal; Rowland, Raymond R. R.; Allen, Jonathan E.; ...

    2017-08-31

    African swine fever virus (ASFV) is a macrophage-tropic virus responsible for ASF, a transboundary disease that threatens swine production world-wide. Since there are no vaccines available to control ASF after an outbreak, obtaining an understanding of the virus-host interaction is important for developing new intervention strategies. In this study, a whole transcriptomic RNA-Seq method was used to characterize differentially expressed genes in pigs infected with a low pathogenic ASFV isolate, OUR T88/3 (OURT), or the highly pathogenic Georgia 2007/1 (GRG). After infection, pigs infected with OURT showed no or few clinical signs; whereas, GRG produced clinical signs consistent with acutemore » ASF. RNA-Seq detected the expression of ASFV genes from the whole blood of the GRG, but not the OURT pigs, consistent with the pathotypes of these strains and the replication of GRG in circulating monocytes. Even though GRG and OURT possess different pathogenic properties, there was significant overlap in the most upregulated host genes. A small number of differentially expressed microRNAs were also detected in GRG and OURT pigs. These data confirm previous studies describing the response of macrophages and lymphocytes to ASFV infection, as well as reveal unique gene pathways upregulated in response to infection with GRG.« less

  16. Transcriptome analysis and identification of P450 genes relevant to imidacloprid detoxification in Bradysia odoriphaga.

    PubMed

    Chen, Chengyu; Wang, Cuicui; Liu, Ying; Shi, Xueyan; Gao, Xiwu

    2018-02-07

    Pesticide tolerance poses many challenges for pest control, particularly for destructive pests such as Bradysia odoriphaga. Imidacloprid has been used to control B. odoriphaga since 2013, however, imidacloprid resistance in B. odoriphaga has developed in recent years. Identifying actual and potential genes involved in detoxification metabolism of imidacloprid could offer solutions for controlling this insect. In this study, RNA-seq was used to explore differentially expressed genes in B. odoriphaga that respond to imidacloprid treatment. Differential expression data between imidacloprid treatment and the control revealed 281 transcripts (176 with annotations) showing upregulation and 394 transcripts (235 with annotations) showing downregulation. Among them, differential expression levels of seven P450 unigenes were associated with imidacloprid detoxification mechanism, with 4 unigenes that were upregulated and 3 unigenes that were downregulated. The qRT-PCR results of the seven differential expression P450 unigenes after imidacloprid treatment were consistent with RNA-Seq data. Furthermore, oral delivery mediated RNA interference of these four upregulated P450 unigenes followed by an insecticide bioassay significantly increased the mortality of imidacloprid-treated B. odoriphaga. This result indicated that the four upregulated P450s are involved in detoxification of imidacloprid. This study provides a genetic basis for further exploring P450 genes for imidacloprid detoxification in B. odoriphaga.

  17. Gene expression analysis of whole blood RNA from pigs infected with low and high pathogenic African swine fever viruses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jaing, Crystal; Rowland, Raymond R. R.; Allen, Jonathan E.

    African swine fever virus (ASFV) is a macrophage-tropic virus responsible for ASF, a transboundary disease that threatens swine production world-wide. Since there are no vaccines available to control ASF after an outbreak, obtaining an understanding of the virus-host interaction is important for developing new intervention strategies. In this study, a whole transcriptomic RNA-Seq method was used to characterize differentially expressed genes in pigs infected with a low pathogenic ASFV isolate, OUR T88/3 (OURT), or the highly pathogenic Georgia 2007/1 (GRG). After infection, pigs infected with OURT showed no or few clinical signs; whereas, GRG produced clinical signs consistent with acutemore » ASF. RNA-Seq detected the expression of ASFV genes from the whole blood of the GRG, but not the OURT pigs, consistent with the pathotypes of these strains and the replication of GRG in circulating monocytes. Even though GRG and OURT possess different pathogenic properties, there was significant overlap in the most upregulated host genes. A small number of differentially expressed microRNAs were also detected in GRG and OURT pigs. These data confirm previous studies describing the response of macrophages and lymphocytes to ASFV infection, as well as reveal unique gene pathways upregulated in response to infection with GRG.« less

  18. Cell Type-Specific Chromatin Signatures Underline Regulatory DNA Elements in Human Induced Pluripotent Stem Cells and Somatic Cells.

    PubMed

    Zhao, Ming-Tao; Shao, Ning-Yi; Hu, Shijun; Ma, Ning; Srinivasan, Rajini; Jahanbani, Fereshteh; Lee, Jaecheol; Zhang, Sophia L; Snyder, Michael P; Wu, Joseph C

    2017-11-10

    Regulatory DNA elements in the human genome play important roles in determining the transcriptional abundance and spatiotemporal gene expression during embryonic heart development and somatic cell reprogramming. It is not well known how chromatin marks in regulatory DNA elements are modulated to establish cell type-specific gene expression in the human heart. We aimed to decipher the cell type-specific epigenetic signatures in regulatory DNA elements and how they modulate heart-specific gene expression. We profiled genome-wide transcriptional activity and a variety of epigenetic marks in the regulatory DNA elements using massive RNA-seq (n=12) and ChIP-seq (chromatin immunoprecipitation combined with high-throughput sequencing; n=84) in human endothelial cells (CD31 + CD144 + ), cardiac progenitor cells (Sca-1 + ), fibroblasts (DDR2 + ), and their respective induced pluripotent stem cells. We uncovered 2 classes of regulatory DNA elements: class I was identified with ubiquitous enhancer (H3K4me1) and promoter (H3K4me3) marks in all cell types, whereas class II was enriched with H3K4me1 and H3K4me3 in a cell type-specific manner. Both class I and class II regulatory elements exhibited stimulatory roles in nearby gene expression in a given cell type. However, class I promoters displayed more dominant regulatory effects on transcriptional abundance regardless of distal enhancers. Transcription factor network analysis indicated that human induced pluripotent stem cells and somatic cells from the heart selected their preferential regulatory elements to maintain cell type-specific gene expression. In addition, we validated the function of these enhancer elements in transgenic mouse embryos and human cells and identified a few enhancers that could possibly regulate the cardiac-specific gene expression. Given that a large number of genetic variants associated with human diseases are located in regulatory DNA elements, our study provides valuable resources for deciphering the epigenetic modulation of regulatory DNA elements that fine-tune spatiotemporal gene expression in human cardiac development and diseases. © 2017 American Heart Association, Inc.

  19. Comparative RNA-Seq based dissection of the regulatory networks and environmental stimuli underlying Vibrio parahaemolyticus gene expression during infection.

    PubMed

    Livny, Jonathan; Zhou, Xiaohui; Mandlik, Anjali; Hubbard, Troy; Davis, Brigid M; Waldor, Matthew K

    2014-10-29

    Vibrio parahaemolyticus is the leading worldwide cause of seafood-associated gastroenteritis, yet little is known regarding its intraintestinal gene expression or physiology. To date, in vivo analyses have focused on identification and characterization of virulence factors--e.g. a crucial Type III secretion system (T3SS2)--rather than genome-wide analyses of in vivo biology. Here, we used RNA-Seq to profile V. parahaemolyticus gene expression in infected infant rabbits, which mimic human infection. Comparative transcriptomic analysis of V. parahaemolyticus isolated from rabbit intestines and from several laboratory conditions enabled identification of mRNAs and sRNAs induced during infection and of regulatory factors that likely control them. More than 12% of annotated V. parahaemolyticus genes are differentially expressed in the intestine, including the genes of T3SS2, which are likely induced by bile-mediated activation of the transcription factor VtrB. Our analyses also suggest that V. parahaemolyticus has access to glucose or other preferred carbon sources in vivo, but that iron is inconsistently available. The V. parahaemolyticus transcriptional response to in vivo growth is far more widespread than and largely distinct from that of V. cholerae, likely due to the distinct ways in which these diarrheal pathogens interact with and modulate the environment in the small intestine. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  20. Unstable Expression of Commonly Used Reference Genes in Rat Pancreatic Islets Early after Isolation Affects Results of Gene Expression Studies.

    PubMed

    Kosinová, Lucie; Cahová, Monika; Fábryová, Eva; Týcová, Irena; Koblas, Tomáš; Leontovyč, Ivan; Saudek, František; Kříž, Jan

    2016-01-01

    The use of RT-qPCR provides a powerful tool for gene expression studies; however, the proper interpretation of the obtained data is crucially dependent on accurate normalization based on stable reference genes. Recently, strong evidence has been shown indicating that the expression of many commonly used reference genes may vary significantly due to diverse experimental conditions. The isolation of pancreatic islets is a complicated procedure which creates severe mechanical and metabolic stress leading possibly to cellular damage and alteration of gene expression. Despite of this, freshly isolated islets frequently serve as a control in various gene expression and intervention studies. The aim of our study was to determine expression of 16 candidate reference genes and one gene of interest (F3) in isolated rat pancreatic islets during short-term cultivation in order to find a suitable endogenous control for gene expression studies. We compared the expression stability of the most commonly used reference genes and evaluated the reliability of relative and absolute quantification using RT-qPCR during 0-120 hrs after isolation. In freshly isolated islets, the expression of all tested genes was markedly depressed and it increased several times throughout the first 48 hrs of cultivation. We observed significant variability among samples at 0 and 24 hrs but substantial stabilization from 48 hrs onwards. During the first 48 hrs, relative quantification failed to reflect the real changes in respective mRNA concentrations while in the interval 48-120 hrs, the relative expression generally paralleled the results determined by absolute quantification. Thus, our data call into question the suitability of relative quantification for gene expression analysis in pancreatic islets during the first 48 hrs of cultivation, as the results may be significantly affected by unstable expression of reference genes. However, this method could provide reliable information from 48 hrs onwards.

  1. Molecular cloning and functional analysis of ESGP, an embryonic stem cell and germ cell specific protein.

    PubMed

    Chen, Yan-Mei; Du, Zhong-Wei; Yao, Zhen

    2005-12-01

    Several putative Oct-4 downstream genes from mouse embryonic stem (ES) cells have been identified using the suppression-subtractive hybridization method. In this study, one of the novel genes encoding an ES cell and germ cell specific protein (ESGP) was cloned by rapid amplification of cDNA ends. ESGP contains 801 bp encoding an 84 amino acid small protein and has no significant homology to any known genes. There is a signal peptide at the N-terminal of ESGP protein as predicted by SeqWeb (GCG) (SeqWeb version 2.0.2, http://gcg.biosino.org:8080/). The result of immunofluorescence assay suggested that ESGP might encode a secretory protein. The expression pattern of ESGP is consistent with the expression of Oct-4 during embryonic development. ESGP protein was detected in fertilized oocyte, from 3.5 day postcoital (dpc) blastocyst to 17.5 dpc embryo, and was only detected in testis and ovary tissues in adult. In vitro, ESGP was only expressed in pluripotent cell lines, such as embryonic stem cells, embryonic caoma cells and embryonic germ cells, but not in their differentiated progenies. Despite its specific expression, forced expression of ESGP is not indispensable for the effect of Oct-4 on ES cell self-renewal, and does not affect the differentiation to three germ layers.

  2. Role of miRNAs and alternative mRNA 3'-end cleavage and polyadenylation of their mRNA targets in cardiomyocyte hypertrophy.

    PubMed

    Soetanto, R; Hynes, C J; Patel, H R; Humphreys, D T; Evers, M; Duan, G; Parker, B J; Archer, S K; Clancy, J L; Graham, R M; Beilharz, T H; Smith, N J; Preiss, T

    2016-05-01

    miRNAs play critical roles in heart disease. In addition to differential miRNA expression, miRNA-mediated control is also affected by variable miRNA processing or alternative 3'-end cleavage and polyadenylation (APA) of their mRNA targets. To what extent these phenomena play a role in the heart remains unclear. We sought to explore miRNA processing and mRNA APA in cardiomyocytes, and whether these change during cardiac hypertrophy. Thoracic aortic constriction (TAC) was performed to induce hypertrophy in C57BL/6J mice. RNA extracted from cardiomyocytes of sham-treated, pre-hypertrophic (2 days post-TAC), and hypertrophic (7 days post-TAC) mice was subjected to small RNA- and poly(A)-test sequencing (PAT-Seq). Differential expression analysis matched expectations; nevertheless we identified ~400 mRNAs and hundreds of noncoding RNA loci as altered with hypertrophy for the first time. Although multiple processing variants were observed for many miRNAs, there was little change in their relative proportions during hypertrophy. PAT-Seq mapped ~48,000 mRNA 3'-ends, identifying novel 3' untranslated regions (3'UTRs) for over 7000 genes. Importantly, hypertrophy was associated with marked changes in APA with a net shift from distal to more proximal mRNA 3'-ends, which is predicted to decrease overall miRNA repression strength. We independently validated several examples of 3'UTR proportion change and showed that alternative 3'UTRs associate with differences in mRNA translation. Our work suggests that APA contributes to altered gene expression with the development of cardiomyocyte hypertrophy and provides a rich resource for a systems-level understanding of miRNA-mediated regulation in physiological and pathological states of the heart. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Proteomic and transcriptomic analysis of lung tissue in OVA-challenged mice.

    PubMed

    Lee, Yongjin; Hwang, Yun-Ho; Kim, Kwang-Jin; Park, Ae-Kyung; Paik, Man-Jeong; Kim, Seong Hwan; Lee, Su Ui; Yee, Sung-Tae; Son, Young-Jin

    2018-01-01

    Asthma is a long term inflammatory disease of the airway of lungs characterized by variable airflow obstruction and bronchospasm. Asthma is caused by a complex combination of environmental and genetic interactions. In this study, we conducted proteomic analysis of samples derived from control and OVA challenged mice for environmental respiratory disease by using 2-D gel electrophoresis. In addition, we explored the genes associated with the environmental substances that cause respiratory disease and conducted RNA-seq by next-generation sequencing. Proteomic analysis revealed 7 up-regulated (keratin KB40, CRP, HSP27, chaperonin containing TCP-1, TCP-10, keratin, and albumin) and 3 down-regulated proteins (PLC-α, PLA2, and precursor ApoA-1). The expression diversity of many genes was found in the lung tissue of OVA challenged moue by RNA-seq. 146 genes were identified as significantly differentially expressed by OVA treatment, and 118 genes of the 146 differentially expressed genes were up-regulated and 28 genes were downregulated. These genes were related to inflammation, mucin production, and airway remodeling. The results presented herein enable diagnosis and the identification of quantitative markers to monitor the progression of environmental respiratory disease using proteomics and genomic approaches.

  4. RNA-Seq transcriptome profiling of mouse oocytes after in vitro maturation and/or vitrification.

    PubMed

    Gao, Lei; Jia, Gongxue; Li, Ai; Ma, Haojia; Huang, Zhengyuan; Zhu, Shien; Hou, Yunpeng; Fu, Xiangwei

    2017-10-16

    In vitro maturation (IVM) and vitrification have been widely used to prepare oocytes before fertilization; however, potential effects of these procedures, such as expression profile changes, are poorly understood. In this study, mouse oocytes were divided into four groups and subjected to combinations of in vitro maturation and/or vitrification treatments. RNA-seq and in silico pathway analysis were used to identify differentially expressed genes (DEGs) that may be involved in oocyte viability after in vitro maturation and/or vitrification. Our results showed that 1) 69 genes were differentially expressed after IVM, 66 of which were up-regulated. Atp5e and Atp5o were enriched in the most significant gene ontology term "mitochondrial membrane part"; thus, these genes may be promising candidate biomarkers for oocyte viability after IVM. 2) The influence of vitrification on the transcriptome of oocytes was negligible, as no DEGs were found between vitrified and fresh oocytes. 3) The MII stage is more suitable for oocyte vitrification with respect to the transcriptome. This study provides a valuable new theoretical basis to further improve the efficiency of in vitro maturation and/or oocyte vitrification.

  5. RNA-Seq reveals 10 novel promising candidate genes affecting milk protein concentration in the Chinese Holstein population.

    PubMed

    Li, Cong; Cai, Wentao; Zhou, Chenghao; Yin, Hongwei; Zhang, Ziqi; Loor, Juan J; Sun, Dongxiao; Zhang, Qin; Liu, Jianfeng; Zhang, Shengli

    2016-06-02

    Paired-end RNA sequencing (RNA-Seq) was used to explore the bovine transcriptome from the mammary tissue of 12 Chinese Holstein cows with 6 extremely high and 6 low phenotypic values for milk protein percentage. We defined the differentially expressed transcripts between the two comparison groups, extremely high and low milk protein percentage during the peak lactation (HP vs LP) and during the non-lactating period (HD vs LD), respectively. Within the differentially expressed genes (DEGs), we detected 157 at peak lactation and 497 in the non-lactating period with a highly significant correlation with milk protein concentration. Integrated interpretation of differential gene expression indicated that SERPINA1, CLU, CNTFR, ERBB2, NEDD4L, ANG, GALE, HSPA8, LPAR6 and CD14 are the most promising candidate genes affecting milk protein concentration. Similarly, LTF, FCGR3A, MEGF10, RRM2 and UBE2C are the most promising candidates that in the non-lactating period could help the mammary tissue prevent issues with inflammation and udder disorders. Putative genes will be valuable resources for designing better breeding strategies to optimize the content of milk protein and also to provide new insights into regulation of lactogenesis.

  6. RNA-seq transcriptional profiling of Herbaspirillum seropedicae colonizing wheat (Triticum aestivum) roots.

    PubMed

    Pankievicz, V C S; Camilios-Neto, D; Bonato, P; Balsanelli, E; Tadra-Sfeir, M Z; Faoro, H; Chubatsu, L S; Donatti, L; Wajnberg, G; Passetti, F; Monteiro, R A; Pedrosa, F O; Souza, E M

    2016-04-01

    Herbaspirillum seropedicae is a diazotrophic and endophytic bacterium that associates with economically important grasses promoting plant growth and increasing productivity. To identify genes related to bacterial ability to colonize plants, wheat seedlings growing hydroponically in Hoagland's medium were inoculated with H. seropedicae and incubated for 3 days. Total mRNA from the bacteria present in the root surface and in the plant medium were purified, depleted from rRNA and used for RNA-seq profiling. RT-qPCR analyses were conducted to confirm regulation of selected genes. Comparison of RNA profile of root attached and planktonic bacteria revealed extensive metabolic adaptations to the epiphytic life style. These adaptations include expression of specific adhesins and cell wall re-modeling to attach to the root. Additionally, the metabolism was adapted to the microxic environment and nitrogen-fixation genes were expressed. Polyhydroxybutyrate (PHB) synthesis was activated, and PHB granules were stored as observed by microscopy. Genes related to plant growth promotion, such as auxin production were expressed. Many ABC transporter genes were regulated in the bacteria attached to the roots. The results provide new insights into the adaptation of H. seropedicae to the interaction with the plant.

  7. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq

    PubMed Central

    Ramsköld, Daniel; Deng, Qiaolin; Johnsson, Per; Michaëlsson, Jakob; Frisén, Jonas; Sandberg, Rickard

    2016-01-01

    Cellular heterogeneity can emerge from the expression of only one parental allele. However, it has remained controversial whether, or to what degree, random monoallelic expression of autosomal genes (aRME) is mitotically inherited (clonal) or stochastic (dynamic) in somatic cells, particularly in vivo. Here, we used allele-sensitive single-cell RNA-seq on clonal primary mouse fibroblasts and in vivo human CD8+ T-cells to dissect clonal and dynamic monoallelic expression patterns. Dynamic aRME affected a considerable portion of the cells’ transcriptomes, with levels dependent on the cells’ transcriptional activity. Importantly, clonal aRME was detected but was surprisingly scarce (<1% of genes) and affected mainly the most low-expressed genes. Consequently, the overwhelming portion of aRME occurs transiently within individual cells and patterns of aRME are thus primarily scattered throughout somatic cell populations rather than, as previously hypothesized, confined to patches of clonally related cells. PMID:27668657

  8. Analysis of allelic expression patterns in clonal somatic cells by single-cell RNA-seq.

    PubMed

    Reinius, Björn; Mold, Jeff E; Ramsköld, Daniel; Deng, Qiaolin; Johnsson, Per; Michaëlsson, Jakob; Frisén, Jonas; Sandberg, Rickard

    2016-11-01

    Cellular heterogeneity can emerge from the expression of only one parental allele. However, it has remained controversial whether, or to what degree, random monoallelic expression of autosomal genes (aRME) is mitotically inherited (clonal) or stochastic (dynamic) in somatic cells, particularly in vivo. Here we used allele-sensitive single-cell RNA-seq on clonal primary mouse fibroblasts and freshly isolated human CD8 + T cells to dissect clonal and dynamic monoallelic expression patterns. Dynamic aRME affected a considerable portion of the cells' transcriptomes, with levels dependent on the cells' transcriptional activity. Notably, clonal aRME was detected, but it was surprisingly scarce (<1% of genes) and mainly affected the most weakly expressed genes. Consequently, the overwhelming majority of aRME occurs transiently within individual cells, and patterns of aRME are thus primarily scattered throughout somatic cell populations rather than, as previously hypothesized, confined to patches of clonally related cells.

  9. Linnorm: improved statistical analysis for single cell RNA-seq expression data

    PubMed Central

    Yip, Shun H.; Wang, Panwen; Kocher, Jean-Pierre A.; Sham, Pak Chung

    2017-01-01

    Abstract Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. PMID:28981748

  10. Combining micro-RNA and protein sequencing to detect robust biomarkers for Graves' disease and orbitopathy.

    PubMed

    Zhang, Lei; Masetti, Giulia; Colucci, Giuseppe; Salvi, Mario; Covelli, Danila; Eckstein, Anja; Kaiser, Ulrike; Draman, Mohd Shazli; Muller, Ilaria; Ludgate, Marian; Lucini, Luigi; Biscarini, Filippo

    2018-05-30

    Graves' Disease (GD) is an autoimmune condition in which thyroid-stimulating antibodies (TRAB) mimic thyroid-stimulating hormone function causing hyperthyroidism. 5% of GD patients develop inflammatory Graves' orbitopathy (GO) characterized by proptosis and attendant sight problems. A major challenge is to identify which GD patients are most likely to develop GO and has relied on TRAB measurement. We screened sera/plasma from 14 GD, 19 GO and 13 healthy controls using high-throughput proteomics and miRNA sequencing (Illumina's HiSeq2000 and Agilent-6550 Funnel quadrupole-time-of-flight mass spectrometry) to identify potential biomarkers for diagnosis or prognosis evaluation. Euclidean distances and differential expression (DE) based on miRNA and protein quantification were analysed by multidimensional scaling (MDS) and multinomial regression respectively. We detected 3025 miRNAs and 1886 proteins and MDS revealed good separation of the 3 groups. Biomarkers were identified by combined DE and Lasso-penalized predictive models; accuracy of predictions was 0.86 (±0:18), and 5 miRNA and 20 proteins were found including Zonulin, Alpha-2 macroglobulin, Beta-2 glycoprotein 1 and Fibronectin. Functional analysis identified relevant metabolic pathways, including hippo signaling, bacterial invasion of epithelial cells and mRNA surveillance. Proteomic and miRNA analyses, combined with robust bioinformatics, identified circulating biomarkers applicable to diagnose GD, predict GO disease status and optimize patient management.

  11. Single nucleotide polymorphism discovery in bovine liver using RNA-seq technology

    PubMed Central

    Pareek, Chandra Shekhar; Błaszczyk, Paweł; Dziuba, Piotr; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Pierzchała, Mariusz; Feng, Yaping; Kadarmideen, Haja N.; Kumar, Dibyendu

    2017-01-01

    Background RNA-seq is a useful next-generation sequencing (NGS) technology that has been widely used to understand mammalian transcriptome architecture and function. In this study, a breed-specific RNA-seq experiment was utilized to detect putative single nucleotide polymorphisms (SNPs) in liver tissue of young bulls of the Polish Red, Polish Holstein-Friesian (HF) and Hereford breeds, and to understand the genomic variation in the three cattle breeds that may reflect differences in production traits. Results The RNA-seq experiment on bovine liver produced 107,114,4072 raw paired-end reads, with an average of approximately 60 million paired-end reads per library. Breed-wise, a total of 345.06, 290.04 and 436.03 million paired-end reads were obtained from the Polish Red, Polish HF, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed that 81.35%, 82.81% and 84.21% of the mapped sequencing reads were properly paired to the Polish Red, Polish HF, and Hereford breeds, respectively. This study identified 5,641,401 SNPs and insertion and deletion (indel) positions expressed in the bovine liver with an average of 313,411 SNPs and indel per young bull. Following the removal of the indel mutations, a total of 195,3804, 152,7120 and 205,3184 raw SNPs expressed in bovine liver were identified for the Polish Red, Polish HF, and Hereford breeds, respectively. Breed-wise, three highly reliable breed-specific SNP-databases (SNP-dbs) with 31,562, 24,945 and 28,194 SNP records were constructed for the Polish Red, Polish HF, and Hereford breeds, respectively. Using a combination of stringent parameters of a minimum depth of ≥10 mapping reads that support the polymorphic nucleotide base and 100% SNP ratio, 4,368, 3,780 and 3,800 SNP records were detected in the Polish Red, Polish HF, and Hereford breeds, respectively. The SNP detections using RNA-seq data were successfully validated by kompetitive allele-specific PCR (KASPTM) SNP genotyping assay. The comprehensive QTL/CG analysis of 110 QTL/CG with RNA-seq data identified 20 monomorphic SNP hit loci (CARTPT, GAD1, GDF5, GHRH, GHRL, GRB10, IGFBPL1, IGFL1, LEP, LHX4, MC4R, MSTN, NKAIN1, PLAG1, POU1F1, SDR16C5, SH2B2, TOX, UCP3 and WNT10B) in all three cattle breeds. However, six SNP loci (CCSER1, GHR, KCNIP4, MTSS1, EGFR and NSMCE2) were identified as highly polymorphic among the cattle breeds. Conclusions This study identified breed-specific SNPs with greater SNP ratio and excellent mapping coverage, as well as monomorphic and highly polymorphic putative SNP loci within QTL/CGs of bovine liver tissue. A breed-specific SNP-db constructed for bovine liver yielded nearly six million SNPs. In addition, a KASPTM SNP genotyping assay, as a reliable cost-effective method, successfully validated the breed-specific putative SNPs originating from the RNA-seq experiments. PMID:28234981

  12. Transcriptome profile of lung dendritic cells after in vitro porcine reproductive and respiratory syndrome virus (PRRSV) infection

    PubMed Central

    Pröll, Maren Julia; Neuhoff, Christiane; Schellander, Karl; Uddin, Muhammad Jasim; Cinar, Mehmet Ulas; Sahadevan, Sudeep; Qu, Xueqi; Islam, Md. Aminul; Poirier, Mikhael; Müller, Marcel A.; Drosten, Christian; Tesfaye, Dawit; Tholen, Ernst; Große-Brinkhaus, Christine

    2017-01-01

    The porcine reproductive and respiratory syndrome (PRRS) is an infectious disease that leads to high financial and production losses in the global swine industry. The pathogenesis of this disease is dependent on a multitude of factors, and its control remains problematic. The immune system generally defends against infectious diseases, especially dendritic cells (DCs), which play a crucial role in the activation of the immune response after viral infections. However, the understanding of the immune response and the genetic impact on the immune response to PRRS virus (PRRSV) remains incomplete. In light of this, we investigated the regulation of the host immune response to PRRSV in porcine lung DCs using RNA-sequencing (RNA-Seq). Lung DCs from two different pig breeds (Pietrain and Duroc) were collected before (0 hours) and during various periods of infection (3, 6, 9, 12, and 24 hours post infection (hpi)). RNA-Seq analysis revealed a total of 20,396 predicted porcine genes, which included breed-specific differentially expressed immune genes. Pietrain and Duroc infected lung DCs showed opposite gene expression courses during the first time points post infection. Duroc lung DCs reacted more strongly and distinctly than Pietrain lung DCs during these periods (3, 6, 9, 12 hpi). Additionally, cluster analysis revealed time-dependent co-expressed groups of genes that were involved in immune-relevant pathways. Key clusters and pathways were identified, which help to explain the biological and functional background of lung DCs post PRRSV infection and suggest IL-1β1 as an important candidate gene. RNA-Seq was also used to characterize the viral replication of PRRSV for each breed. PRRSV was able to infect and to replicate differently in lung DCs between the two mentioned breeds. These results could be useful in investigations on immunity traits in pig breeding and enhancing the health of pigs. PMID:29140992

  13. Using RNA-seq data to select reference genes for normalizing gene expression in apple roots.

    PubMed

    Zhou, Zhe; Cong, Peihua; Tian, Yi; Zhu, Yanmin

    2017-01-01

    Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization.

  14. Using RNA-seq data to select reference genes for normalizing gene expression in apple roots

    PubMed Central

    Zhou, Zhe; Cong, Peihua; Tian, Yi

    2017-01-01

    Gene expression in apple roots in response to various stress conditions is a less-explored research subject. Reliable reference genes for normalizing quantitative gene expression data have not been carefully investigated. In this study, the suitability of a set of 15 apple genes were evaluated for their potential use as reliable reference genes. These genes were selected based on their low variance of gene expression in apple root tissues from a recent RNA-seq data set, and a few previously reported apple reference genes for other tissue types. Four methods, Delta Ct, geNorm, NormFinder and BestKeeper, were used to evaluate their stability in apple root tissues of various genotypes and under different experimental conditions. A small panel of stably expressed genes, MDP0000095375, MDP0000147424, MDP0000233640, MDP0000326399 and MDP0000173025 were recommended for normalizing quantitative gene expression data in apple roots under various abiotic or biotic stresses. When the most stable and least stable reference genes were used for data normalization, significant differences were observed on the expression patterns of two target genes, MdLecRLK5 (MDP0000228426, a gene encoding a lectin receptor like kinase) and MdMAPK3 (MDP0000187103, a gene encoding a mitogen-activated protein kinase). Our data also indicated that for those carefully validated reference genes, a single reference gene is sufficient for reliable normalization of the quantitative gene expression. Depending on the experimental conditions, the most suitable reference genes can be specific to the sample of interest for more reliable RT-qPCR data normalization. PMID:28934340

  15. mRNA-Seq Reveals Novel Molecular Mechanisms and a Robust Fingerprint in Graves' Disease

    PubMed Central

    Sachidanandam, Ravi; Morshed, Syed; Latif, Rauf; Shi, Ruijin; Davies, Terry F.

    2014-01-01

    Context: The immune response in autoimmune thyroid disease has been shown to occur primarily within the thyroid gland in which the most abundant antigens can be found. A variety of capture molecules are known to be expressed by thyroid epithelial cells and serve to attract and help retain an intrathyroidal immune infiltrate. Objective: To explore the entire repertoire of expressed genes in human thyroid tissue, we have deep sequenced the transcriptome (referred to as mRNA-Seq). Design and Patients: We applied mRNA-Seq to thyroid tissue from nine patients with Graves' disease subjected to total thyroidectomy and compared the data with 12 samples of normal thyroid tissue obtained from patients having a thyroid nodule removed. The expression for each gene was calculated from the sequencing data by taking the median of the coverage across the length of the gene. The expression levels were quantile normalized and a gene signature was derived from these. Results: On comparison of expression levels in tissues derived from Graves' patients and controls, there was clear evidence for overexpression of the antigen presentation pathway consisting of HLA and associated genes. We also found a robust disease signature and discovered active innate and adaptive immune signaling networks. Conclusions: These data reveal an active immune defense system in Graves' disease, which involves novel molecular mechanisms in its pathogenesis and development. PMID:24971664

  16. TopHat: discovering splice junctions with RNA-Seq

    PubMed Central

    Trapnell, Cole; Pachter, Lior; Salzberg, Steven L.

    2009-01-01

    Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: cole@cs.umd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19289445

  17. Real-Time Reverse-Transcription Quantitative Polymerase Chain Reaction Assay Is a Feasible Method for the Relative Quantification of Heregulin Expression in Non-Small Cell Lung Cancer Tissue.

    PubMed

    Kristof, Jessica; Sakrison, Kellen; Jin, Xiaoping; Nakamaru, Kenji; Schneider, Matthias; Beckman, Robert A; Freeman, Daniel; Spittle, Cindy; Feng, Wenqin

    2017-01-01

    In preclinical studies, heregulin ( HRG ) expression was shown to be the most relevant predictive biomarker for response to patritumab, a fully human anti-epidermal growth factor receptor 3 monoclonal antibody. In support of a phase 2 study of erlotinib ± patritumab in non-small cell lung cancer (NSCLC), a reverse-transcription quantitative polymerase chain reaction (RT-qPCR) assay for relative quantification of HRG expression from formalin-fixed paraffin-embedded (FFPE) NSCLC tissue samples was developed and validated and described herein. Test specimens included matched FFPE normal lung and NSCLC and frozen NSCLC tissue, and HRG -positive and HRG -negative cell lines. Formalin-fixed paraffin-embedded tissue was examined for functional performance. Heregulin distribution was also analyzed across 200 NSCLC commercial samples. Applied Biosystems TaqMan Gene Expression Assays were run on the Bio-Rad CFX96 real-time PCR platform. Heregulin RT-qPCR assay specificity, PCR efficiency, PCR linearity, and reproducibility were demonstrated. The final assay parameters included the Qiagen FFPE RNA Extraction Kit for RNA extraction from FFPE NSCLC tissue, 50 ng of RNA input, and 3 reference (housekeeping) genes ( HMBS, IPO8 , and EIF2B1 ), which had expression levels similar to HRG expression levels and were stable among FFPE NSCLC samples. Using the validated assay, unimodal HRG distribution was confirmed across 185 evaluable FFPE NSCLC commercial samples. Feasibility of an RT-qPCR assay for the quantification of HRG expression in FFPE NSCLC specimens was demonstrated.

  18. Gene expression profiling analysis of the effects of low-intensity pulsed ultrasound on induced pluripotent stem cell-derived neural crest stem cells.

    PubMed

    Xia, Bin; Zou, Yang; Xu, Zhiling; Lv, Yonggang

    2017-11-01

    Low-intensity pulsed ultrasound (LIPUS) is a noninvasive technique that has been shown to affect cell proliferation, migration, and differentiation and promote the regeneration of damaged peripheral nerve. Our previous studies had proved that LIPUS can significantly promote the neural differentiation of induced pluripotent stem cell-derived neural crest stem cells (iPSCs-NCSCs) and enhance the repair of rat-transected sciatic nerve. To further explore the underlying mechanisms of LIPUS treatment of iPSCs-NCSCs, this study reported the gene expression profiling analysis of iPSCs-NCSCs before and after LIPUS treatment using the RNA-sequencing (RNA-Seq) method. It was found that expression of 76 genes of iPSCs-NCSCs cultured in a serum-free neural induction medium and expression of 21 genes of iPSCs-NCSCs cultured in a neuronal differentiation medium were significantly changed by LIPUS treatment. The differentially expressed genes are related to angiogenesis, nervous system activity and functions, cell activities, and so on. The RNA-seq results were further verified by a quantitative real-time reverse transcriptase polymerase chain reaction (qRT-PCR). High correlation was observed between the results obtained from qRT-PCR and RNA-Seq. This study presented new information on the global gene expression patterns of iPSCs-NCSCs after LIPUS treatment and may expand the understanding of the complex molecular mechanism of LIPUS treatment of iPSCs-NCSCs. © 2017 International Union of Biochemistry and Molecular Biology, Inc.

  19. DNA methylation patterns and gene expression associated with litter size in Berkshire pig placenta

    PubMed Central

    Kwon, Seulgi; Park, Da Hye; Kim, Tae Wan; Kang, Deok Gyeong; Yu, Go Eun; Kim, Il-Suk; Park, Hwa Chun; Ha, Jeongim; Kim, Chul Wook

    2017-01-01

    Increasing litter size is of great interest to the pig industry. DNA methylation is an important epigenetic modification that regulates gene expression, resulting in livestock phenotypes such as disease resistance, milk production, and reproduction. We classified Berkshire pigs into two groups according to litter size and estimated breeding value: smaller (SLG) and larger (LLG) litter size groups. Genome-wide DNA methylation and gene expression were analyzed using placenta genomic DNA and RNA to identify differentially methylated regions (DMRs) and differentially expressed genes (DEGs) associated with litter size. The methylation levels of CpG dinucleotides in different genomic regions were noticeably different between the groups, while global methylation pattern was similar, and excluding intergenic regions they were found the most frequently in gene body regions. Next, we analyzed RNA-Seq data to identify DEGs between the SLG and LLG groups. A total of 1591 DEGs were identified: 567 were downregulated and 1024 were upregulated in LLG compared to SLG. To identify genes that simultaneously exhibited changes in DNA methylation and mRNA expression, we integrated and analyzed the data from bisulfite-Seq and RNA-Seq. Nine DEGs positioned in DMRs were found. The expression of only three of these genes (PRKG2, CLCA4, and PCK1) was verified by RT-qPCR. Furthermore, we observed the same methylation patterns in blood samples as in the placental tissues by PCR-based methylation analysis. Together, these results provide useful data regarding potential epigenetic markers for selecting hyperprolific sows. PMID:28880934

  20. Obese rats supplemented with bitter melon display marked shifts in the expression of genes controlling inflammatory response and lipid metabolism by RNA-Seq analysis of colonic mucosa.

    PubMed

    Bai, Juan; Zhu, Ying; Dong, Ying

    2018-06-01

    Obesity is known to induce pathological changes in the gut and diets rich in complex carbohydrates that resist digestion in the small bowel can alter large bowel ecology. The purposes of this study were to identify the effects of bitter melon powder (BMP) on the global gene expression pattern in the colon mucosa of obese rats. Obese rats were fed a high-fat diet and treated without or with BMP for 8 weeks. Genome-wide expression profiles of the colon mucosa were determined by RNA sequencing (RNA-Seq) analysis at the end of experiment. A total of 87 genes were identified as differentially expressed (DE) between these two groups (fold change > 1.2). These results were further validated by quantitative RT-PCR, confirming the high reliability of the RNA-Seq. Interestingly, DE genes implicated in inflammation and lipid metabolism were found to be downregulated by BMP in the colon. Network between genes and the top 15 KEGG pathways showed that PRKCβ (protein kinase C beta) and Pla2g2a (phospholipase A2 group IIA) strongly interacted with surrounding pathways and genes. Results revealed that BMP supplement could remodel key colon functions by altering transcriptomic profile in obese rats.

  1. Genome-wide Discovery of Circular RNAs in the Leaf and Seedling Tissues of Arabidopsis Thaliana

    PubMed Central

    Dou, Yongchao; Li, Shengjun; Yang, Weilong; Liu, Kan; Du, Qian; Ren, Guodong; Yu, Bin; Zhang, Chi

    2017-01-01

    Background: Recently, identification and functional studies of circular RNAs, a type of non-coding RNAs arising from a ligation of 3’ and 5’ ends of a linear RNA molecule, were conducted in mammalian cells with the development of RNA-seq technology. Method: Since compared with animals, studies on circular RNAs in plants are less thorough, a genome-wide identification of circular RNA candidates in Arabidopsis was conducted with our own developed bioinformatics tool to several existing RNA-seq datasets specifically for non-coding RNAs. Results: A total of 164 circular RNA candidates were identified from RNA-seq data, and 4 circular RNA transcripts, including both exonic and intronic circular RNAs, were experimentally validated. Interestingly, our results show that circular RNA transcripts are enriched in the photosynthesis system for the leaf tissue and correlated to the higher expression levels of their parent genes. Sixteen out of all 40 genes that have circular RNA candidates are related to the photosynthesis system, and out of the total 146 exonic circular RNA candidates, 63 are found in chloroplast. PMID:29081691

  2. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome

    PubMed Central

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-01-01

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude. PMID:27377755

  3. Comprehensive RNA-Seq profiling to evaluate lactating sheep mammary gland transcriptome.

    PubMed

    Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan-José

    2016-07-05

    RNA-Seq enables the generation of extensive transcriptome information providing the capability to characterize transcripts (including alternative isoforms and polymorphism), to quantify expression and to identify differential regulation in a single experiment. Our aim in this study was to take advantage of using RNA-Seq high-throughput technology to provide a comprehensive transcriptome profiling of the sheep lactating mammary gland. Eight ewes of two dairy sheep breeds with differences in milk production traits were used in this experiment (four Churra and four Assaf ewes). Milk samples from these animals were collected on days 10, 50, 120 and 150 after lambing to cover the various physiological stages of the mammary gland across the complete lactation. RNA samples were extracted from milk somatic cells. The RNA-Seq dataset was generated using an Illumina HiSeq 2000 sequencer. The information reported here will be useful to understand the biology of lactation in sheep, providing also an opportunity to characterize their different patterns on milk production aptitude.

  4. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences.

    PubMed

    Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar

    2017-06-22

    DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.

  5. Highly sensitive and unbiased approach for elucidating antibody repertoires

    PubMed Central

    Lin, Sherry G.; Ba, Zhaoqing; Du, Zhou; Zhang, Yu; Hu, Jiazhi; Alt, Frederick W.

    2016-01-01

    Developing B lymphocytes undergo V(D)J recombination to assemble germ-line V, D, and J gene segments into exons that encode the antigen-binding variable region of Ig heavy (H) and light (L) chains. IgH and IgL chains associate to form the B-cell receptor (BCR), which, upon antigen binding, activates B cells to secrete BCR as an antibody. Each of the huge number of clonally independent B cells expresses a unique set of IgH and IgL variable regions. The ability of V(D)J recombination to generate vast primary B-cell repertoires results from a combinatorial assortment of large numbers of different V, D, and J segments, coupled with diversification of the junctions between them to generate the complementary determining region 3 (CDR3) for antigen contact. Approaches to evaluate in depth the content of primary antibody repertoires and, ultimately, to study how they are further molded by secondary mutation and affinity maturation processes are of great importance to the B-cell development, vaccine, and antibody fields. We now describe an unbiased, sensitive, and readily accessible assay, referred to as high-throughput genome-wide translocation sequencing-adapted repertoire sequencing (HTGTS-Rep-seq), to quantify antibody repertoires. HTGTS-Rep-seq quantitatively identifies the vast majority of IgH and IgL V(D)J exons, including their unique CDR3 sequences, from progenitor and mature mouse B lineage cells via the use of specific J primers. HTGTS-Rep-seq also accurately quantifies DJH intermediates and V(D)J exons in either productive or nonproductive configurations. HTGTS-Rep-seq should be useful for studies of human samples, including clonal B-cell expansions, and also for following antibody affinity maturation processes. PMID:27354528

  6. Distinct tissue-specific transcriptional regulation revealed by gene regulatory networks in maize.

    PubMed

    Huang, Ji; Zheng, Juefei; Yuan, Hui; McGinnis, Karen

    2018-06-07

    Transcription factors (TFs) are proteins that can bind to DNA sequences and regulate gene expression. Many TFs are master regulators in cells that contribute to tissue-specific and cell-type-specific gene expression patterns in eukaryotes. Maize has been a model organism for over one hundred years, but little is known about its tissue-specific gene regulation through TFs. In this study, we used a network approach to elucidate gene regulatory networks (GRNs) in four tissues (leaf, root, SAM and seed) in maize. We utilized GENIE3, a machine-learning algorithm combined with large quantity of RNA-Seq expression data to construct four tissue-specific GRNs. Unlike some other techniques, this approach is not limited by high-quality Position Weighed Matrix (PWM), and can therefore predict GRNs for over 2000 TFs in maize. Although many TFs were expressed across multiple tissues, a multi-tiered analysis predicted tissue-specific regulatory functions for many transcription factors. Some well-studied TFs emerged within the four tissue-specific GRNs, and the GRN predictions matched expectations based upon published results for many of these examples. Our GRNs were also validated by ChIP-Seq datasets (KN1, FEA4 and O2). Key TFs were identified for each tissue and matched expectations for key regulators in each tissue, including GO enrichment and identity with known regulatory factors for that tissue. We also found functional modules in each network by clustering analysis with the MCL algorithm. By combining publicly available genome-wide expression data and network analysis, we can uncover GRNs at tissue-level resolution in maize. Since ChIP-Seq and PWMs are still limited in several model organisms, our study provides a uniform platform that can be adapted to any species with genome-wide expression data to construct GRNs. We also present a publicly available database, maize tissue-specific GRN (mGRN, https://www.bio.fsu.edu/mcginnislab/mgrn/ ), for easy querying. All source code and data are available at Github ( https://github.com/timedreamer/maize_tissue-specific_GRN ).

  7. Evaluating whole transcriptome amplification for gene profiling experiments using RNA-Seq.

    PubMed

    Faherty, Sheena L; Campbell, C Ryan; Larsen, Peter A; Yoder, Anne D

    2015-07-30

    RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.

  8. Salmonella DIVA vaccine reduces disease, colonization and shedding due to virulent S. Typhimurium infection in swine

    PubMed Central

    Bearson, Shawn M. D; Brunelle, Brian W; Bayles, Darrell O; Lee, In Soo; Kich, Jalusa D

    2017-01-01

    Purpose Non-host-adapted Salmonella serovars, including the common human food-borne pathogen Salmonella enterica serovar Typhimurium (S. Typhimurium), are opportunistic pathogens that can colonize food-producing animals without causing overt disease. Interventions against Salmonella are needed to enhance food safety, protect animal health and allow the differentiation of infected from vaccinated animals (DIVA). Methodology An attenuated S. Typhimurium DIVA vaccine (BBS 866) was characterized for the protection of pigs following challenge with virulent S. Typhimurium. The porcine transcriptional response to BBS 866 vaccination was evaluated. RNA-Seq analysis was used to compare gene expression between BBS 866 and its parent; phenotypic assays were performed to confirm transcriptional differences observed between the strains. Results Vaccination significantly reduced fever and interferon-gamma (IFNγ) levels in swine challenged with virulent S. Typhimurium compared to mock-vaccinated pigs. Salmonella faecal shedding and gastrointestinal tissue colonization were significantly lower in vaccinated swine. RNA-Seq analysis comparing BBS 866 to its parental S. Typhimurium strain demonstrated reduced expression of the genes involved in cellular invasion and bacterial motility; decreased invasion of porcine-derived IPEC-J2 cells and swimming motility for the vaccine strain was consistent with the RNA-Seq analysis. Numerous membrane proteins were differentially expressed, which was an anticipated gene expression pattern due to the targeted deletion of several regulatory genes in the vaccine strain. RNA-Seq analysis indicated that genes involved in the porcine immune and inflammatory response were differentially regulated at 2 days post-vaccination compared to pre-vaccination. Conclusion Evaluation of the S. Typhimurium DIVA vaccine indicates that vaccination will provide both swine health and food safety benefits. PMID:28516860

  9. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    PubMed Central

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  10. Global Transcriptomic Effects of Environmentally Relevant Concentrations of the Neonicotinoids Clothianidin, Imidacloprid, and Thiamethoxam in the Brain of Honey Bees ( Apis mellifera).

    PubMed

    Christen, Verena; Schirrmann, Melanie; Frey, Juerg E; Fent, Karl

    2018-06-14

    Neonicotinoids are implicated in the decline of honey bees, but the molecular basis underlying adverse effects is poorly known. Here we describe global transcriptomic profiles in the brain of honey bee workers exposed for 48 h at one environmentally realistic and one sublethal concentration of 0.3 and 3.0 ng/bee clothianidin and imidacloprid, respectively, and 0.1 and 1.0 ng/bee thiamethoxam (1-30 ng/mL sucrose solution) by high-throughput RNA-sequencing (RNA-seq). All neonicotinoids led to significant alteration (mainly down-regulation) of gene expression, generally with a concentration-dependent effect. Among many others, genes related to metabolism and detoxification were differently expressed. Gene ontology (GO) enrichment analysis of biological processes revealed catabolic carbohydrate metabolism (regulation of enzyme activities such as amylase), lipid metabolism, and transport mechanisms as shared terms between all neonicotinoids at high concentrations. KEGG pathway analysis indicated that at least two neonicotinoids induced changes in expression of various metabolic pathways: pentose phosphate pathways, starch and sucrose metabolism, and sulfur metabolism, in which glucose 1-dehydrogenase and alpha-amylase were down-regulated and 3'(2'), 5'-bisphosphate nucleotidase was up-regulated. RT-qPCR analysis confirmed the down-regulation of major royal jelly proteins, hbg3, and cyp9e2 found by RNA-seq. Our study highlights the comparative molecular effects of neonicotinoid exposure to bees. Further studies should link these effects with physiological outcomes for a better understanding of effects of neonicotinoids.

  11. bigSCale: an analytical framework for big-scale single-cell data.

    PubMed

    Iacono, Giovanni; Mereu, Elisabetta; Guillaumet-Adkins, Amy; Corominas, Roser; Cuscó, Ivon; Rodríguez-Esteban, Gustavo; Gut, Marta; Pérez-Jurado, Luis Alberto; Gut, Ivo; Heyn, Holger

    2018-06-01

    Single-cell RNA sequencing (scRNA-seq) has significantly deepened our insights into complex tissues, with the latest techniques capable of processing tens of thousands of cells simultaneously. Analyzing increasing numbers of cells, however, generates extremely large data sets, extending processing time and challenging computing resources. Current scRNA-seq analysis tools are not designed to interrogate large data sets and often lack sensitivity to identify marker genes. With bigSCale, we provide a scalable analytical framework to analyze millions of cells, which addresses the challenges associated with large data sets. To handle the noise and sparsity of scRNA-seq data, bigSCale uses large sample sizes to estimate an accurate numerical model of noise. The framework further includes modules for differential expression analysis, cell clustering, and marker identification. A directed convolution strategy allows processing of extremely large data sets, while preserving transcript information from individual cells. We evaluated the performance of bigSCale using both a biological model of aberrant gene expression in patient-derived neuronal progenitor cells and simulated data sets, which underlines the speed and accuracy in differential expression analysis. To test its applicability for large data sets, we applied bigSCale to assess 1.3 million cells from the mouse developing forebrain. Its directed down-sampling strategy accumulates information from single cells into index cell transcriptomes, thereby defining cellular clusters with improved resolution. Accordingly, index cell clusters identified rare populations, such as reelin ( Reln )-positive Cajal-Retzius neurons, for which we report previously unrecognized heterogeneity associated with distinct differentiation stages, spatial organization, and cellular function. Together, bigSCale presents a solution to address future challenges of large single-cell data sets. © 2018 Iacono et al.; Published by Cold Spring Harbor Laboratory Press.

  12. High-throughput detection of RNA processing in bacteria.

    PubMed

    Gill, Erin E; Chan, Luisa S; Winsor, Geoffrey L; Dobson, Neil; Lo, Raymond; Ho Sui, Shannan J; Dhillon, Bhavjinder K; Taylor, Patrick K; Shrestha, Raunak; Spencer, Cory; Hancock, Robert E W; Unrau, Peter J; Brinkman, Fiona S L

    2018-03-27

    Understanding the RNA processing of an organism's transcriptome is an essential but challenging step in understanding its biology. Here we investigate with unprecedented detail the transcriptome of Pseudomonas aeruginosa PAO1, a medically important and innately multi-drug resistant bacterium. We systematically mapped RNA cleavage and dephosphorylation sites that result in 5'-monophosphate terminated RNA (pRNA) using monophosphate RNA-Seq (pRNA-Seq). Transcriptional start sites (TSS) were also mapped using differential RNA-Seq (dRNA-Seq) and both datasets were compared to conventional RNA-Seq performed in a variety of growth conditions. The pRNA-Seq library revealed known tRNA, rRNA and transfer-messenger RNA (tmRNA) processing sites, together with previously uncharacterized RNA cleavage events that were found disproportionately near the 5' ends of transcripts associated with basic bacterial functions such as oxidative phosphorylation and purine metabolism. The majority (97%) of the processed mRNAs were cleaved at precise codon positions within defined sequence motifs indicative of distinct endonucleolytic activities. The most abundant of these motifs corresponded closely to an E. coli RNase E site previously established in vitro. Using the dRNA-Seq library, we performed an operon analysis and predicted 3159 potential TSS. A correlation analysis uncovered 105 antiparallel pairs of TSS that were separated by 18 bp from each other and were centered on single palindromic TAT(A/T)ATA motifs (likely - 10 promoter elements), suggesting that, consistent with previous in vitro experimentation, these sites can initiate transcription bi-directionally and may thus provide a novel form of transcriptional regulation. TSS and RNA-Seq analysis allowed us to confirm expression of small non-coding RNAs (ncRNAs), many of which are differentially expressed in swarming and biofilm formation conditions. This study uses pRNA-Seq, a method that provides a genome-wide survey of RNA processing, to study the bacterium Pseudomonas aeruginosa and discover extensive transcript processing not previously appreciated. We have also gained novel insight into RNA maturation and turnover as well as a potential novel form of transcription regulation. NOTE: All sequence data has been submitted to the NCBI sequence read archive. Accession numbers are as follows: [NCBI sequence read archive: SRX156386, SRX157659, SRX157660, SRX157661, SRX157683 and SRX158075]. The sequence data is viewable using Jbrowse on www.pseudomonas.com .

  13. Oligodendrocyte gene expression is reduced by and influences effects of chronic social stress in mice.

    PubMed

    Cathomas, F; Azzinnari, D; Bergamini, G; Sigrist, H; Buerge, M; Hoop, V; Wicki, B; Goetze, L; Soares, S; Kukelova, D; Seifritz, E; Goebbels, S; Nave, K-A; Ghandour, M S; Seoighe, C; Hildebrandt, T; Leparc, G; Klein, H; Stupka, E; Hengerer, B; Pryce, C R

    2018-03-22

    Oligodendrocyte gene expression is downregulated in stress-related neuropsychiatric disorders, including depression. In mice, chronic social stress (CSS) leads to depression-relevant changes in brain and emotional behavior, and the present study shows the involvement of oligodendrocytes in this model. In C57BL/6 (BL/6) mice, RNA-sequencing (RNA-Seq) was conducted with prefrontal cortex, amygdala and hippocampus from CSS and controls; a gene enrichment database for neurons, astrocytes and oligodendrocytes was used to identify cell origin of deregulated genes, and cell deconvolution was applied. To assess the potential causal contribution of reduced oligodendrocyte gene expression to CSS effects, mice heterozygous for the oligodendrocyte gene cyclic nucleotide phosphodiesterase (Cnp1) on a BL/6 background were studied; a 2 genotype (wildtype, Cnp1 +/- ) × 2 environment (control, CSS) design was used to investigate effects on emotional behavior and amygdala microglia. In BL/6 mice, in prefrontal cortex and amygdala tissue comprising gray and white matter, CSS downregulated expression of multiple oligodendroycte genes encoding myelin and myelin-axon-integrity proteins, and cell deconvolution identified a lower proportion of oligodendrocytes in amygdala. Quantification of oligodendrocyte proteins in amygdala gray matter did not yield evidence for reduced translation, suggesting that CSS impacts primarily on white matter oligodendrocytes or the myelin transcriptome. In Cnp1 mice, social interaction was reduced by CSS in Cnp1 +/- mice specifically; using ionized calcium-binding adaptor molecule 1 (IBA1) expression, microglia activity was increased additively by Cnp1 +/- and CSS in amygdala gray and white matter. This study provides back-translational evidence that oligodendrocyte changes are relevant to the pathophysiology and potentially the treatment of stress-related neuropsychiatric disorders. © 2018 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.

  14. Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.

    PubMed

    Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry

    2016-11-01

    Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.

  15. Measuring the diversity of the human microbiota with targeted next-generation sequencing.

    PubMed

    Finotello, Francesca; Mastrorilli, Eleonora; Di Camillo, Barbara

    2016-12-26

    The human microbiota is a complex ecological community of commensal, symbiotic and pathogenic microorganisms harboured by the human body. Next-generation sequencing (NGS) technologies, in particular targeted amplicon sequencing of the 16S ribosomal RNA gene (16S-seq), are enabling the identification and quantification of human-resident microorganisms at unprecedented resolution, providing novel insights into the role of the microbiota in health and disease. Once microbial abundances are quantified through NGS data analysis, diversity indices provide valuable mathematical tools to describe the ecological complexity of a single sample or to detect species differences between samples. However, diversity is not a determined physical quantity for which a consensus definition and unit of measure have been established, and several diversity indices are currently available. Furthermore, they were originally developed for macroecology and their robustness to the possible bias introduced by sequencing has not been characterized so far. To assist the reader with the selection and interpretation of diversity measures, we review a panel of broadly used indices, describing their mathematical formulations, purposes and properties, and characterize their behaviour and criticalities in dependence of the data features using simulated data as ground truth. In addition, we make available an R package, DiversitySeq, which implements in a unified framework the full panel of diversity indices and a simulator of 16S-seq data, and thus represents a valuable resource for the analysis of diversity from NGS count data and for the benchmarking of computational methods for 16S-seq. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  16. Linnorm: improved statistical analysis for single cell RNA-seq expression data.

    PubMed

    Yip, Shun H; Wang, Panwen; Kocher, Jean-Pierre A; Sham, Pak Chung; Wang, Junwen

    2017-12-15

    Linnorm is a novel normalization and transformation method for the analysis of single cell RNA sequencing (scRNA-seq) data. Linnorm is developed to remove technical noises and simultaneously preserve biological variations in scRNA-seq data, such that existing statistical methods can be improved. Using real scRNA-seq data, we compared Linnorm with existing normalization methods, including NODES, SAMstrt, SCnorm, scran, DESeq and TMM. Linnorm shows advantages in speed, technical noise removal and preservation of cell heterogeneity, which can improve existing methods in the discovery of novel subtypes, pseudo-temporal ordering of cells, clustering analysis, etc. Linnorm also performs better than existing DEG analysis methods, including BASiCS, NODES, SAMstrt, Seurat and DESeq2, in false positive rate control and accuracy. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

    PubMed Central

    Petegrosso, Raphael; Tolar, Jakub

    2018-01-01

    Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC. PMID:29630593

  18. Deciphering the Developmental Dynamics of the Mouse Liver Transcriptome

    PubMed Central

    Gunewardena, Sumedha S.; Yoo, Byunggil; Peng, Lai; Lu, Hong; Zhong, Xiaobo; Klaassen, Curtis D.; Cui, Julia Yue

    2015-01-01

    During development, liver undergoes a rapid transition from a hematopoietic organ to a major organ for drug metabolism and nutrient homeostasis. However, little is known on a transcriptome level of the genes and RNA-splicing variants that are differentially regulated with age, and which up-stream regulators orchestrate age-specific biological functions in liver. We used RNA-Seq to interrogate the developmental dynamics of the liver transcriptome in mice at 12 ages from late embryonic stage (2-days before birth) to maturity (60-days after birth). Among 21,889 unique NCBI RefSeq-annotated genes, 9,641 were significantly expressed in at least one age, 7,289 were differently regulated with age, and 859 had multiple (> = 2) RNA splicing-variants. Factor analysis showed that the dynamics of hepatic genes fall into six distinct groups based on their temporal expression. The average expression of cytokines, ion channels, kinases, phosphatases, transcription regulators and translation regulators decreased with age, whereas the average expression of peptidases, enzymes and transmembrane receptors increased with age. The average expression of growth factors peak between Day-3 and Day-10, and decrease thereafter. We identified critical biological functions, upstream regulators, and putative transcription modules that seem to govern age-specific gene expression. We also observed differential ontogenic expression of known splicing variants of certain genes, and 1,455 novel splicing isoform candidates. In conclusion, the hepatic ontogeny of the transcriptome ontogeny has unveiled critical networks and up-stream regulators that orchestrate age-specific biological functions in liver, and suggest that age contributes to the complexity of the alternative splicing landscape of the hepatic transcriptome. PMID:26496202

  19. Deciphering the Developmental Dynamics of the Mouse Liver Transcriptome.

    PubMed

    Gunewardena, Sumedha S; Yoo, Byunggil; Peng, Lai; Lu, Hong; Zhong, Xiaobo; Klaassen, Curtis D; Cui, Julia Yue

    2015-01-01

    During development, liver undergoes a rapid transition from a hematopoietic organ to a major organ for drug metabolism and nutrient homeostasis. However, little is known on a transcriptome level of the genes and RNA-splicing variants that are differentially regulated with age, and which up-stream regulators orchestrate age-specific biological functions in liver. We used RNA-Seq to interrogate the developmental dynamics of the liver transcriptome in mice at 12 ages from late embryonic stage (2-days before birth) to maturity (60-days after birth). Among 21,889 unique NCBI RefSeq-annotated genes, 9,641 were significantly expressed in at least one age, 7,289 were differently regulated with age, and 859 had multiple (> = 2) RNA splicing-variants. Factor analysis showed that the dynamics of hepatic genes fall into six distinct groups based on their temporal expression. The average expression of cytokines, ion channels, kinases, phosphatases, transcription regulators and translation regulators decreased with age, whereas the average expression of peptidases, enzymes and transmembrane receptors increased with age. The average expression of growth factors peak between Day-3 and Day-10, and decrease thereafter. We identified critical biological functions, upstream regulators, and putative transcription modules that seem to govern age-specific gene expression. We also observed differential ontogenic expression of known splicing variants of certain genes, and 1,455 novel splicing isoform candidates. In conclusion, the hepatic ontogeny of the transcriptome ontogeny has unveiled critical networks and up-stream regulators that orchestrate age-specific biological functions in liver, and suggest that age contributes to the complexity of the alternative splicing landscape of the hepatic transcriptome.

  20. Transcript profiling reveals expression differences in wild-type and glabrous soybean lines

    PubMed Central

    2011-01-01

    Background Trichome hairs affect diverse agronomic characters such as seed weight and yield, prevent insect damage and reduce loss of water but their molecular control has not been extensively studied in soybean. Several detailed models for trichome development have been proposed for Arabidopsis thaliana, but their applicability to important crops such as cotton and soybean is not fully known. Results Two high throughput transcript sequencing methods, Digital Gene Expression (DGE) Tag Profiling and RNA-Seq, were used to compare the transcriptional profiles in wild-type (cv. Clark standard, CS) and a mutant (cv. Clark glabrous, i.e., trichomeless or hairless, CG) soybean isoline that carries the dominant P1 allele. DGE data and RNA-Seq data were mapped to the cDNAs (Glyma models) predicted from the reference soybean genome, Williams 82. Extending the model length by 250 bp at both ends resulted in significantly more matches of authentic DGE tags indicating that many of the predicted gene models are prematurely truncated at the 5' and 3' UTRs. The genome-wide comparative study of the transcript profiles of the wild-type versus mutant line revealed a number of differentially expressed genes. One highly-expressed gene, Glyma04g35130, in wild-type soybean was of interest as it has high homology to the cotton gene GhRDL1 gene that has been identified as being involved in cotton fiber initiation and is a member of the BURP protein family. Sequence comparison of Glyma04g35130 among Williams 82 with our sequences derived from CS and CG isolines revealed various SNPs and indels including addition of one nucleotide C in the CG and insertion of ~60 bp in the third exon of CS that causes a frameshift mutation and premature truncation of peptides in both lines as compared to Williams 82. Conclusion Although not a candidate for the P1 locus, a BURP family member (Glyma04g35130) from soybean has been shown to be abundantly expressed in the CS line and very weakly expressed in the glabrous CG line. RNA-Seq and DGE data are compared and provide experimental data on the expression of predicted soybean gene models as well as an overview of the genes expressed in young shoot tips of two closely related isolines. PMID:22029708

  1. Transcriptomic Analysis of Paeonia delavayi Wild Population Flowers to Identify Differentially Expressed Genes Involved in Purple-Red and Yellow Petal Pigmentation

    PubMed Central

    Wang, Yan; Li, Kui; Zheng, Baoqiang; Miao, Kun

    2015-01-01

    Tree peony (Paeonia suffruticosa Andrews) is a very famous traditional ornamental plant in China. P. delavayi is a species endemic to Southwest China that has aroused great interest from researchers as a precious genetic resource for flower color breeding. However, the current understanding of the molecular mechanisms of flower pigmentation in this plant is limited, hindering the genetic engineering of novel flower color in tree peonies. In this study, we conducted a large-scale transcriptome analysis based on Illumina HiSeq sequencing of cDNA libraries generated from yellow and purple-red P. delavayi petals. A total of 90,202 unigenes were obtained by de novo assembly, with an average length of 721 nt. Using Blastx, 44,811 unigenes (49.68%) were found to have significant similarity to accessions in the NR, NT, and Swiss-Prot databases. We also examined COG, GO and KEGG annotations to better understand the functions of these unigenes. Further analysis of the two digital transcriptomes revealed that 6,855 unigenes were differentially expressed between yellow and purple-red flower petals, with 3,430 up-regulated and 3,425 down-regulated. According to the RNA-Seq data and qRT-PCR analysis, we proposed that four up-regulated key structural genes, including F3H, DFR, ANS and 3GT, might play an important role in purple-red petal pigmentation, while high co-expression of THC2'GT, CHI and FNS II ensures the accumulation of pigments contributing to the yellow color. We also found 50 differentially expressed transcription factors that might be involved in flavonoid biosynthesis. This study is the first to report genetic information for P. delavayi. The large number of gene sequences produced by transcriptome sequencing and the candidate genes identified using pathway mapping and expression profiles will provide a valuable resource for future association studies aimed at better understanding the molecular mechanisms underlying flower pigmentation in tree peonies. PMID:26267644

  2. RNA sequencing analysis reveals new findings of hyperbaric oxygen treatment on rats with acute carbon monoxide poisoning.

    PubMed

    Wang, Wenlan; Xue, Li; Li, Ya; Li, Rong; Xie, Xiaoping; Bao, Junxiang; Hai, Chunxu; Li, Jinsheng

    2016-01-01

    To elucidate the altered gene network in the brains of carbon monoxide (CO) poisoned rats after treatment with hyperbaric oxygen (HBO₂). RNA sequencing (RNA-seq) analysis was performed to examine differentially expressed genes (DEGs) in brain tissue samples from nine male rats: a normal control group; a CO poisoning group; and an HBO₂ treatment group (three rats/group). Reverse transcription polymerase chain reaction (RT-PCR) and real-time quantitative PCR were used for validation of the DEGs in another 18 male rats (six rats/group). RNA-seq revealed that two genes were upregulated (4.18 and 8.76 log to the base 2 fold change) (p⟨0.05) in the CO-poisoned rats relative to the control rats; two genes were upregulated (3.88 and 7.69 log to the base 2 fold change); and 23 genes were downregulated (3.49-15.12 log to the base 2 fold change) (p⟨0.05) in the brains of the HBO₂-treated rats relative to the CO-poisoned rats. Target prediction of DEGs by gene network analysis and analysis of pathways affected suggested that regulation of gene expressions of dopamine metabolism and nitric oxide (NO) synthesis were significantly affected by CO poisoning and HBO₂ treatment. Results of RT-PCR and real-time quantitative PCR indicated that four genes (Pomc, GH-1, Pr1 and Fshβ) associated with hormone secretion in the hypothalamic-pituitary system have potential as markers for prognosis of CO. This study is the first RNA-seq analysis profile of HBO₂ treatment on rats with acute CO poisoning. It concludes that changes of hormone secretion in the hypothalamic-pituitary system, dopamine metabolism and NO synthesis involved in brain damage and behavior abnormalities after CO poisoning and HBO₂ therapy may regulate these changes.

  3. Differential RNA-seq analysis comparing APC-defective and APC-restored SW480 colorectal cancer cells.

    PubMed

    King, Lauren E; Love, Christopher G; Sieber, Oliver M; Faux, Maree C; Burgess, Antony W

    2016-03-01

    The adenomatous polyposis coli (APC) tumour suppressor gene is mutated in about 80% of colorectal cancers (CRC) Brannon et al. (2014) [1]. APC is a large multifunctional protein that regulates many biological functions including Wnt signalling (through the regulation of beta-catenin stability) Reya and Clevers (2005) [2], cell migration Kroboth et al. (2007), Sansom et al. (2004) [3], [4], mitosis Kaplan et al. (2001) [5], cell adhesion Faux et al. (2004), Carothers et al. (2001) [6], [7] and differentiation Sansom et al. (2004) [4]. Although the role of APC in CRC is often described as the deregulation of Wnt signalling, its other biological functions suggest that there are other factors at play that contribute to the onset of adenomas and the progression of CRC upon the truncation of APC. To identify genes and pathways that are dysregulated as a consequence of loss of function of APC, we compared the gene expression profiles of the APC mutated human CRC cell line SW480 following reintroduction of wild-type APC (SW480 + APC) or empty control vector (SW480 + vector control) Faux et al. (2004) . Here we describe the RNA-seq data derived for three biological replicates of parental SW480, SW480 + vector control and SW480 + APC cells, and present the bioinformatics pipeline used to test for differential gene expression and pathway enrichment analysis. A total of 1735 genes showed significant differential expression when APC was restored and were enriched for genes associated with cell polarity, Wnt signalling and the epithelial to mesenchymal transition. There was additional enrichment for genes involved in cell-cell adhesion, cell-matrix junctions, angiogenesis, axon morphogenesis and cell movement. The raw and analysed RNA-seq data have been deposited in the Gene Expression Omnibus (GEO) database under accession number GSE76307. This dataset is useful for further investigations of the impact of APC mutation on the properties of colorectal cancer cells.

  4. Comprehensive gene expression analysis of canine invasive urothelial bladder carcinoma by RNA-Seq.

    PubMed

    Maeda, Shingo; Tomiyasu, Hirotaka; Tsuboi, Masaya; Inoue, Akiko; Ishihara, Genki; Uchikai, Takao; Chambers, James K; Uchida, Kazuyuki; Yonezawa, Tomohiro; Matsuki, Naoaki

    2018-04-27

    Invasive urothelial carcinoma (iUC) is a major cause of death in humans, and approximately 165,000 individuals succumb to this cancer annually worldwide. Comparative oncology using relevant animal models is necessary to improve our understanding of progression, diagnosis, and treatment of iUC. Companion canines are a preferred animal model of iUC due to spontaneous tumor development and similarity to human disease in terms of histopathology, metastatic behavior, and treatment response. However, the comprehensive molecular characterization of canine iUC is not well documented. In this study, we performed transcriptome analysis of tissue samples from canine iUC and normal bladders using an RNA sequencing (RNA-Seq) approach to identify key molecular pathways in canine iUC. Total RNA was extracted from bladder tissues of 11 dogs with iUC and five healthy dogs, and RNA-Seq was conducted. Ingenuity Pathway Analysis (IPA) was used to assign differentially expressed genes to known upstream regulators and functional networks. Differential gene expression analysis of the RNA-Seq data revealed 2531 differentially expressed genes, comprising 1007 upregulated and 1524 downregulated genes, in canine iUC. IPA revealed that the most activated upstream regulator was PTGER2 (encoding the prostaglandin E 2 receptor EP2), which is consistent with the therapeutic efficiency of cyclooxygenase inhibitors in canine iUC. Similar to human iUC, canine iUC exhibited upregulated ERBB2 and downregulated TP53 pathways. Biological functions associated with cancer, cell proliferation, and leukocyte migration were predicted to be activated, while muscle functions were predicted to be inhibited, indicating muscle-invasive tumor property. Our data confirmed similarities in gene expression patterns between canine and human iUC and identified potential therapeutic targets (PTGER2, ERBB2, CCND1, Vegf, and EGFR), suggesting the value of naturally occurring canine iUC as a relevant animal model for human iUC.

  5. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

    PubMed Central

    2012-01-01

    Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019

  6. Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

    PubMed

    Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

    2012-09-17

    RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.

  7. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

    PubMed

    Song, Li; Florea, Liliana

    2015-01-01

    Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.

  8. sequoia controls the type I>0 daughter proliferation switch in the developing Drosophila nervous system.

    PubMed

    Gunnar, Erika; Bivik, Caroline; Starkenberg, Annika; Thor, Stefan

    2016-10-15

    Neural progenitors typically divide asymmetrically to renew themselves, while producing daughters with more limited potential. In the Drosophila embryonic ventral nerve cord, neuroblasts initially produce daughters that divide once to generate two neurons/glia (type I proliferation mode). Subsequently, many neuroblasts switch to generating daughters that differentiate directly (type 0). This programmed type I>0 switch is controlled by Notch signaling, triggered at a distinct point of lineage progression in each neuroblast. However, how Notch signaling onset is gated was unclear. We recently identified Sequoia (Seq), a C2H2 zinc-finger transcription factor with homology to Drosophila Tramtrack (Ttk) and the positive regulatory domain (PRDM) family, as important for lineage progression. Here, we find that seq mutants fail to execute the type I>0 daughter proliferation switch and also display increased neuroblast proliferation. Genetic interaction studies reveal that seq interacts with the Notch pathway, and seq furthermore affects expression of a Notch pathway reporter. These findings suggest that seq may act as a context-dependent regulator of Notch signaling, and underscore the growing connection between Seq, Ttk, the PRDM family and Notch signaling. © 2016. Published by The Company of Biologists Ltd.

  9. Tissue-aware RNA-Seq processing and normalization for heterogeneous and sparse data.

    PubMed

    Paulson, Joseph N; Chen, Cho-Yi; Lopes-Ramos, Camila M; Kuijjer, Marieke L; Platig, John; Sonawane, Abhijeet R; Fagny, Maud; Glass, Kimberly; Quackenbush, John

    2017-10-03

    Although ultrahigh-throughput RNA-Sequencing has become the dominant technology for genome-wide transcriptional profiling, the vast majority of RNA-Seq studies typically profile only tens of samples, and most analytical pipelines are optimized for these smaller studies. However, projects are generating ever-larger data sets comprising RNA-Seq data from hundreds or thousands of samples, often collected at multiple centers and from diverse tissues. These complex data sets present significant analytical challenges due to batch and tissue effects, but provide the opportunity to revisit the assumptions and methods that we use to preprocess, normalize, and filter RNA-Seq data - critical first steps for any subsequent analysis. We find that analysis of large RNA-Seq data sets requires both careful quality control and the need to account for sparsity due to the heterogeneity intrinsic in multi-group studies. We developed Yet Another RNA Normalization software pipeline (YARN), that includes quality control and preprocessing, gene filtering, and normalization steps designed to facilitate downstream analysis of large, heterogeneous RNA-Seq data sets and we demonstrate its use with data from the Genotype-Tissue Expression (GTEx) project. An R package instantiating YARN is available at http://bioconductor.org/packages/yarn .

  10. An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

    PubMed

    Xu, Maoqi; Chen, Liang

    2018-01-01

    The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. Identification of extracellular miRNA in archived serum samples by next-generation sequencing from RNA extracted using multiple methods.

    PubMed

    Gautam, Aarti; Kumar, Raina; Dimitrov, George; Hoke, Allison; Hammamieh, Rasha; Jett, Marti

    2016-10-01

    miRNAs act as important regulators of gene expression by promoting mRNA degradation or by attenuating protein translation. Since miRNAs are stably expressed in bodily fluids, there is growing interest in profiling these miRNAs, as it is minimally invasive and cost-effective as a diagnostic matrix. A technical hurdle in studying miRNA dynamics is the ability to reliably extract miRNA as small sample volumes and low RNA abundance create challenges for extraction and downstream applications. The purpose of this study was to develop a pipeline for the recovery of miRNA using small volumes of archived serum samples. The RNA was extracted employing several widely utilized RNA isolation kits/methods with and without addition of a carrier. The small RNA library preparation was carried out using Illumina TruSeq small RNA kit and sequencing was carried out using Illumina platform. A fraction of five microliters of total RNA was used for library preparation as quantification is below the detection limit. We were able to profile miRNA levels in serum from all the methods tested. We found out that addition of nucleic acid based carrier molecules had higher numbers of processed reads but it did not enhance the mapping of any miRBase annotated sequences. However, some of the extraction procedures offer certain advantages: RNA extracted by TRIzol seemed to align to the miRBase best; extractions using TRIzol with carrier yielded higher miRNA-to-small RNA ratios. Nuclease free glycogen can be carrier of choice for miRNA sequencing. Our findings illustrate that miRNA extraction and quantification is influenced by the choice of methodologies. Addition of nucleic acid- based carrier molecules during extraction procedure is not a good choice when assaying miRNA using sequencing. The careful selection of an extraction method permits the archived serum samples to become valuable resources for high-throughput applications.

  12. Nebula--a web-server for advanced ChIP-seq data analysis.

    PubMed

    Boeva, Valentina; Lermine, Alban; Barette, Camille; Guillouf, Christel; Barillot, Emmanuel

    2012-10-01

    ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3-5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Nebula is available at http://nebula.curie.fr/ Supplementary data are available at Bioinformatics online.

  13. RNA-Seq Analysis of the Expression of Genes Encoding Cell Wall Degrading Enzymes during Infection of Lupin (Lupinus angustifolius) by Phytophthora parasitica

    PubMed Central

    Blackman, Leila M.; Cullerne, Darren P.; Torreña, Pernelyn; Taylor, Jen; Hardham, Adrienne R.

    2015-01-01

    RNA-Seq analysis has shown that over 60% (12,962) of the predicted transcripts in the Phytophthora parasitica genome are expressed during the first 60 h of lupin root infection. The infection transcriptomes included 278 of the 431 genes encoding P. parasitica cell wall degrading enzymes. The transcriptome data provide strong evidence of global transcriptional cascades of genes whose encoded proteins target the main categories of plant cell wall components. A major cohort of pectinases is predominantly expressed early but as infection progresses, the transcriptome becomes increasingly dominated by transcripts encoding cellulases, hemicellulases, β-1,3-glucanases and glycoproteins. The most highly expressed P. parasitica carbohydrate active enzyme gene contains two CBM1 cellulose binding modules and no catalytic domains. The top 200 differentially expressed genes include β-1,4-glucosidases, β-1,4-glucanases, β-1,4-galactanases, a β-1,3-glucanase, an α-1,4-polygalacturonase, a pectin deacetylase and a pectin methylesterase. Detailed analysis of gene expression profiles provides clues as to the order in which linkages within the complex carbohydrates may come under attack. The gene expression profiles suggest that (i) demethylation of pectic homogalacturonan occurs before its deacetylation; (ii) cleavage of the backbone of pectic rhamnogalacturonan I precedes digestion of its side chains; (iii) early attack on cellulose microfibrils by non-catalytic cellulose-binding proteins and enzymes with auxiliary activities may facilitate subsequent attack by glycosyl hydrolases and enzymes containing CBM1 cellulose-binding modules; (iv) terminal hemicellulose backbone residues are targeted after extensive internal backbone cleavage has occurred; and (v) the carbohydrate chains on glycoproteins are degraded late in infection. A notable feature of the P. parasitica infection transcriptome is the high level of transcription of genes encoding enzymes that degrade β-1,3-glucanases during middle and late stages of infection. The results suggest that high levels of β-1,3-glucanases may effectively degrade callose as it is produced by the plant during the defence response. PMID:26332397

  14. RNA-Seq Analysis of the Expression of Genes Encoding Cell Wall Degrading Enzymes during Infection of Lupin (Lupinus angustifolius) by Phytophthora parasitica.

    PubMed

    Blackman, Leila M; Cullerne, Darren P; Torreña, Pernelyn; Taylor, Jen; Hardham, Adrienne R

    2015-01-01

    RNA-Seq analysis has shown that over 60% (12,962) of the predicted transcripts in the Phytophthora parasitica genome are expressed during the first 60 h of lupin root infection. The infection transcriptomes included 278 of the 431 genes encoding P. parasitica cell wall degrading enzymes. The transcriptome data provide strong evidence of global transcriptional cascades of genes whose encoded proteins target the main categories of plant cell wall components. A major cohort of pectinases is predominantly expressed early but as infection progresses, the transcriptome becomes increasingly dominated by transcripts encoding cellulases, hemicellulases, β-1,3-glucanases and glycoproteins. The most highly expressed P. parasitica carbohydrate active enzyme gene contains two CBM1 cellulose binding modules and no catalytic domains. The top 200 differentially expressed genes include β-1,4-glucosidases, β-1,4-glucanases, β-1,4-galactanases, a β-1,3-glucanase, an α-1,4-polygalacturonase, a pectin deacetylase and a pectin methylesterase. Detailed analysis of gene expression profiles provides clues as to the order in which linkages within the complex carbohydrates may come under attack. The gene expression profiles suggest that (i) demethylation of pectic homogalacturonan occurs before its deacetylation; (ii) cleavage of the backbone of pectic rhamnogalacturonan I precedes digestion of its side chains; (iii) early attack on cellulose microfibrils by non-catalytic cellulose-binding proteins and enzymes with auxiliary activities may facilitate subsequent attack by glycosyl hydrolases and enzymes containing CBM1 cellulose-binding modules; (iv) terminal hemicellulose backbone residues are targeted after extensive internal backbone cleavage has occurred; and (v) the carbohydrate chains on glycoproteins are degraded late in infection. A notable feature of the P. parasitica infection transcriptome is the high level of transcription of genes encoding enzymes that degrade β-1,3-glucanases during middle and late stages of infection. The results suggest that high levels of β-1,3-glucanases may effectively degrade callose as it is produced by the plant during the defence response.

  15. Global DNA methylation analysis reveals miR-214-3p contributes to cisplatin resistance in pediatric intracranial nongerminomatous malignant germ cell tumors.

    PubMed

    Hsieh, Tsung-Han; Liu, Yun-Ru; Chang, Ting-Yu; Liang, Muh-Lii; Chen, Hsin-Hung; Wang, Hsei-Wei; Yen, Yun; Wong, Tai-Tong

    2018-03-27

    Pediatric central nervous system germ cell tumors (CNSGCTs) are rare and heterogeneous neoplasms, which can be divided into germinomas and nongerminomatous germ cell tumors (NGGCTs). NGGCTs are further subdivided into mature teratomas and nongerminomatous malignant GCTs (NGMGCTs). Clinical outcomes suggest that NGMGCTs have poor prognosis and survival and that they require more extensive radiotherapy and adjuvant chemotherapy. However, the mechanisms underlying this difference are still unclear. DNA methylation alteration is generally acknowledged to cause therapeutic resistance in cancers. We hypothesized that the pediatric NGMGCTs exhibit a different genome-wide DNA methylation pattern, which is involved in the mechanism of its therapeutic resistance. We performed methylation and hydroxymethylation DNA immunoprecipitation sequencing, mRNA expression microarray, and small RNA sequencing (smRNA-seq) to determine methylation-regulated genes, including microRNAs (miRNAs). The expression levels of 97 genes and 8 miRNAs were correlated with promoter DNA methylation and hydroxymethylation status, such as the miR-199/-214 cluster, and treatment with DNA demethylating agent 5-aza-2'-deoxycytidine elevated its expression level. Furthermore, smRNA-seq analysis showed 27 novel miRNA candidates with differential expression between germinomas and NGMGCTs. Overexpresssion of miR-214-3p in NCCIT cells leads to reduced expression of the pro-apoptotic protein BCL2-like 11 and induces cisplatin resistance. We interrogated the differential DNA methylation patterns between germinomas and NGMGCTs and proposed a mechanism for chemoresistance in NGMGCTs. In addition, our sequencing data provide a roadmap for further pediatric CNSGCT research and potential targets for the development of new therapeutic strategies.

  16. Characteristics of allelic gene expression in human brain cells from single-cell RNA-seq data analysis.

    PubMed

    Zhao, Dejian; Lin, Mingyan; Pedrosa, Erika; Lachman, Herbert M; Zheng, Deyou

    2017-11-10

    Monoallelic expression of autosomal genes has been implicated in human psychiatric disorders. However, there is a paucity of allelic expression studies in human brain cells at the single cell and genome wide levels. In this report, we reanalyzed a previously published single-cell RNA-seq dataset from several postmortem human brains and observed pervasive monoallelic expression in individual cells, largely in a random manner. Examining single nucleotide variants with a predicted functional disruption, we found that the "damaged" alleles were overall expressed in fewer brain cells than their counterparts, and at a lower level in cells where their expression was detected. We also identified many brain cell type-specific monoallelically expressed genes. Interestingly, many of these cell type-specific monoallelically expressed genes were enriched for functions important for those brain cell types. In addition, function analysis showed that genes displaying monoallelic expression and correlated expression across neuronal cells from different individual brains were implicated in the regulation of synaptic function. Our findings suggest that monoallelic gene expression is prevalent in human brain cells, which may play a role in generating cellular identity and neuronal diversity and thus increasing the complexity and diversity of brain cell functions.

  17. Defining the location of promoter-associated R-loops at near-nucleotide resolution using bisDRIP-seq

    PubMed Central

    Dumelie, Jason G

    2017-01-01

    R-loops are features of chromatin consisting of a strand of DNA hybridized to RNA, as well as the expelled complementary DNA strand. R-loops are enriched at promoters where they have recently been shown to have important roles in modifying gene expression. However, the location of promoter-associated R-loops and the genomic domains they perturb to modify gene expression remain unclear. To resolve this issue, we developed a bisulfite-based approach, bisDRIP-seq, to map R-loops across the genome at near-nucleotide resolution in MCF-7 cells. We found the location of promoter-associated R-loops is dependent on the presence of introns. In intron-containing genes, R-loops are bounded between the transcription start site and the first exon-intron junction. In intronless genes, the 3' boundary displays gene-specific heterogeneity. Moreover, intronless genes are often associated with promoter-associated R-loop formation. Together, these studies provide a high-resolution map of R-loops and identify gene structure as a critical determinant of R-loop formation. PMID:29072160

  18. Nascent-Seq reveals novel features of mouse circadian transcriptional regulation

    PubMed Central

    Menet, Jerome S; Rodriguez, Joseph; Abruzzi, Katharine C; Rosbash, Michael

    2012-01-01

    A substantial fraction of the metazoan transcriptome undergoes circadian oscillations in many cells and tissues. Based on the transcription feedback loops important for circadian timekeeping, it is commonly assumed that this mRNA cycling reflects widespread transcriptional regulation. To address this issue, we directly measured the circadian dynamics of mouse liver transcription using Nascent-Seq (genome-wide sequencing of nascent RNA). Although many genes are rhythmically transcribed, many rhythmic mRNAs manifest poor transcriptional rhythms, indicating a prominent contribution of post-transcriptional regulation to circadian mRNA expression. This analysis of rhythmic transcription also showed that the rhythmic DNA binding profile of the transcription factors CLOCK and BMAL1 does not determine the transcriptional phase of most target genes. This likely reflects gene-specific collaborations of CLK:BMAL1 with other transcription factors. These insights from Nascent-Seq indicate that it should have broad applicability to many other gene expression regulatory issues. DOI: http://dx.doi.org/10.7554/eLife.00011.001 PMID:23150795

  19. Spatial reconstruction of single-cell gene expression data.

    PubMed

    Satija, Rahul; Farrell, Jeffrey A; Gennert, David; Schier, Alexander F; Regev, Aviv

    2015-05-01

    Spatial localization is a key determinant of cellular fate and behavior, but methods for spatially resolved, transcriptome-wide gene expression profiling across complex tissues are lacking. RNA staining methods assay only a small number of transcripts, whereas single-cell RNA-seq, which measures global gene expression, separates cells from their native spatial context. Here we present Seurat, a computational strategy to infer cellular localization by integrating single-cell RNA-seq data with in situ RNA patterns. We applied Seurat to spatially map 851 single cells from dissociated zebrafish (Danio rerio) embryos and generated a transcriptome-wide map of spatial patterning. We confirmed Seurat's accuracy using several experimental approaches, then used the strategy to identify a set of archetypal expression patterns and spatial markers. Seurat correctly localizes rare subpopulations, accurately mapping both spatially restricted and scattered groups. Seurat will be applicable to mapping cellular localization within complex patterned tissues in diverse systems.

  20. cChIP-seq: a robust small-scale method for investigation of histone modifications.

    PubMed

    Valensisi, Cristina; Liao, Jo Ling; Andrus, Colin; Battle, Stephanie L; Hawkins, R David

    2015-12-21

    ChIP-seq is highly utilized for mapping histone modifications that are informative about gene regulation and genome annotations. For example, applying ChIP-seq to histone modifications such as H3K4me1 has facilitated generating epigenomic maps of putative enhancers. This powerful technology, however, is limited in its application by the large number of cells required. ChIP-seq involves extensive manipulation of sample material and multiple reactions with limited quality control at each step, therefore, scaling down the number of cells required has proven challenging. Recently, several methods have been proposed to overcome this limit but most of these methods require extensive optimization to tailor the protocol to the specific antibody used or number of cells being profiled. Here we describe a robust, yet facile method, which we named carrier ChIP-seq (cChIP-seq), for use on limited cell amounts. cChIP-seq employs a DNA-free histone carrier in order to maintain the working ChIP reaction scale, removing the need to tailor reactions to specific amounts of cells or histone modifications to be assayed. We have applied our method to three different histone modifications, H3K4me3, H3K4me1 and H3K27me3 in the K562 cell line, and H3K4me1 in H1 hESCs. We successfully obtained epigenomic maps for these histone modifications starting with as few as 10,000 cells. We compared cChIP-seq data to data generated as part of the ENCODE project. ENCODE data are the reference standard in the field and have been generated starting from tens of million of cells. Our results show that cChIP-seq successfully recapitulates bulk data. Furthermore, we showed that the differences observed between small-scale ChIP-seq data and ENCODE data are largely to be due to lab-to-lab variability rather than operating on a reduced scale. Data generated using cChIP-seq are equivalent to reference epigenomic maps from three orders of magnitude more cells. Our method offers a robust and straightforward approach to scale down ChIP-seq to as low as 10,000 cells. The underlying principle of our strategy makes it suitable for being applied to a vast range of chromatin modifications without requiring expensive optimization. Furthermore, our strategy of a DNA-free carrier can be adapted to most ChIP-seq protocols.

  1. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

    PubMed

    Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu

    2018-05-30

    One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.

  2. A Survey for Novel Imprinted Genes in the Mouse Placenta by mRNA-seq

    PubMed Central

    Wang, Xu; Soloway, Paul D.; Clark, Andrew G.

    2011-01-01

    Many questions about the regulation, functional specialization, computational prediction, and evolution of genomic imprinting would be better addressed by having an exhaustive genome-wide catalog of genes that display parent-of-origin differential expression. As a first-pass scan for novel imprinted genes, we performed mRNA-seq experiments on embryonic day 17.5 (E17.5) mouse placenta cDNA samples from reciprocal cross F1 progeny of AKR and PWD mouse strains and quantified the allele-specific expression and the degree of parent-of-origin allelic imbalance. We confirmed the imprinting status of 23 known imprinted genes in the placenta and found that 12 genes reported previously to be imprinted in other tissues are also imprinted in mouse placenta. Through a well-replicated design using an orthogonal allelic-expression technology, we verified 5 novel imprinted genes that were not previously known to be imprinted in mouse (Pde10, Phf17, Phactr2, Zfp64, and Htra3). Our data suggest that most of the strongly imprinted genes have already been identified, at least in the placenta, and that evidence supports perhaps 100 additional weakly imprinted genes. Despite previous appearance that the placenta tends to display an excess of maternally expressed imprinted genes, with the addition of our validated set of placenta-imprinted genes, this maternal bias has disappeared. PMID:21705755

  3. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

    PubMed Central

    Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

    2016-01-01

    Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591

  4. RNA-seq Analysis of Clinical-Grade Osteochondral Allografts Reveals Activation of Early Response Genes

    PubMed Central

    Lin, Yang; Lewallen, Eric A.; Camilleri, Emily T.; Bonin, Carolina A.; Jones, Dakota L.; Dudakovic, Amel; Galeano-Garces, Catalina; Wang, Wei; Karperien, Marcel J.; Larson, Annalise N.; Dahm, Diane L.; Stuart, Michael J.; Levy, Bruce A.; Smith, Jay; Ryssman, Daniel B.; Westendorf, Jennifer J.; Im, Hee-Jeong; van Wijnen, Andre J.; Riester, Scott M.; Krych, Aaron J.

    2016-01-01

    Preservation of osteochondral allografts used for transplantation is critical to ensure favorable outcomes for patients after surgical treatment of cartilage defects. To study the biological effects of protocols currently used for cartilage storage, we investigated differences in gene expression between stored allograft cartilage and fresh cartilage from living donors using high throughput molecular screening strategies. We applied next generation RNA sequencing (RNA-seq) and real-time reverse transcription quantitative polymerase chain reaction (RT-qPCR) to assess genome-wide differences in mRNA expression between stored allograft cartilage and fresh cartilage tissue from living donors. Gene ontology analysis was used to characterize biological pathways associated with differentially expressed genes. Our studies establish reduced levels of mRNAs encoding cartilage related extracellular matrix (ECM) proteins (i.e., COL1A1, COL2A1, COL10A1, ACAN, DCN, HAPLN1, TNC, and COMP) in stored cartilage. These changes occur concomitantly with increased expression of “early response genes” that encode transcription factors mediating stress/cytoprotective responses (i.e., EGR1, EGR2, EGR3, MYC, FOS, FOSB, FOSL1, FOSL2, JUN, JUNB, and JUND). The elevated expression of “early response genes” and reduced levels of ECM-related mRNAs in stored cartilage allografts suggests that tissue viability may be maintained by a cytoprotective program that reduces cell metabolic activity. These findings have potential implications for future studies focused on quality assessment and clinical optimization of osteochondral allografts used for cartilage transplantation. PMID:26909883

  5. Radiation-induced alternative transcripts as detected in total and polysome-bound mRNA.

    PubMed

    Wahba, Amy; Ryan, Michael C; Shankavaram, Uma T; Camphausen, Kevin; Tofilon, Philip J

    2018-01-02

    Alternative splicing is a critical event in the posttranscriptional regulation of gene expression. To investigate whether this process influences radiation-induced gene expression we defined the effects of ionizing radiation on the generation of alternative transcripts in total cellular mRNA (the transcriptome) and polysome-bound mRNA (the translatome) of the human glioblastoma stem-like cell line NSC11. For these studies, RNA-Seq profiles from control and irradiated cells were compared using the program SpliceSeq to identify transcripts and splice variations induced by radiation. As compared to the transcriptome (total RNA) of untreated cells, the radiation-induced transcriptome contained 92 splice events suggesting that radiation induced alternative splicing. As compared to the translatome (polysome-bound RNA) of untreated cells, the radiation-induced translatome contained 280 splice events of which only 24 were overlapping with the radiation-induced transcriptome. These results suggest that radiation not only modifies alternative splicing of precursor mRNA, but also results in the selective association of existing mRNA isoforms with polysomes. Comparison of radiation-induced alternative transcripts to radiation-induced gene expression in total RNA revealed little overlap (about 3%). In contrast, in the radiation-induced translatome, about 38% of the induced alternative transcripts corresponded to genes whose expression level was affected in the translatome. This study suggests that whereas radiation induces alternate splicing, the alternative transcripts present at the time of irradiation may play a role in the radiation-induced translational control of gene expression and thus cellular radioresponse.

  6. STAT5A and STAT5B have opposite correlations with drug response gene expression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lamba, V., E-mail: vlamba@ufl.edu; Jia, B.; Liang, F.

    Introduction: STAT5A and STAT5B are important transcription factors that play a key role in regulation of several important physiological processes including proliferation, survival, mediation of responses to cytokines and in regulating gender differences in drug response genes such as the hepatic cytochrome P450s (CYPs) that are responsible for a large majority of drug metabolism reactions in the human body. STAT5A and STAT5b have a high degree of sequence homology and have been reported to have largely similar functions. Recent studies have, however, indicated that they can also often have distinct and unique roles in regulating gene expression. Objective: In thismore » study, we evaluated the association of STAT5A and STAT5B mRNA expression levels with those of several key hepatic cytochrome P450s (CYPs) and hepatic transcription factors (TFs) and evaluated the potential roles of STAT5A and 5b in mediating gender differences in these CYPs and TFs. Methods: Expression profiling for major hepatic CYP isoforms and transcription factors was performed using RNA sequencing (RNA-seq) in 102 human liver samples (57 female, 45 male). Real time PCR gene expression data for selected CYPs and TFs was available on a subset of 50 human liver samples (25 female, 25 male) and was used to validate the RNA-seq findings. Results: While STAT5A demonstrated significant negative correlation with expression levels of multiple hepatic transcription factors (including NR1I2 and HNF4A) and DMEs such as CYP3A4 and CYP2C19, STAT5B expression was observed to demonstrate positive associations with several CYPs and TFs analyzed. As STAT5A and STAT5B have been shown to be important in regulation of gender differences in CYPs, we also analyzed STAT5A and 5b associations with CYPs and TFs separately in males and females and observed gender dependent differential associations of STATs with several CYPs and TFs. Results from the real time PCR validation largely supported our RNA-seq findings. Conclusions: Using both RNA sequencing and real time PCR, we examined the association of STAT5A and STAT5B mRNA expression with CYP and TF gene expression. While STAT5A demonstrated significant negative correlations with expression levels of multiple hepatic TFs (including NR1I2 and HNF4α) and CYPs (eg. CYP3A4, CYP2C19), STAT5B expression was observed to demonstrate positive association with most of the CYPs/TFs analyzed suggesting that STAT5A and STAT5b have potentially different and distinct roles in regulating expression of hepatic drug response genes. Further studies are needed to elucidate the potential roles of STAT5A and 5b in regulation of CYPs/TFs and the potential implications of these findings.« less

  7. Transcriptional Profiling of Saccharomyces cerevisiae Reveals the Impact of Variation of a Single Transcription Factor on Differential Gene Expression in 4NQO, Fermentable, and Nonfermentable Carbon Sources

    PubMed Central

    Rong-Mullins, Xiaoqing; Ayers, Michael C.; Summers, Mahmoud; Gallagher, Jennifer E. G.

    2017-01-01

    Cellular metabolism can change the potency of a chemical’s tumorigenicity. 4-nitroquinoline-1-oxide (4NQO) is a tumorigenic drug widely used on animal models for cancer research. Polymorphisms of the transcription factor Yrr1 confer different levels of resistance to 4NQO in Saccharomyces cerevisiae. To study how different Yrr1 alleles regulate gene expression leading to resistance, transcriptomes of three isogenic S. cerevisiae strains carrying different Yrr1 alleles were profiled via RNA sequencing (RNA-Seq) and chromatin immunoprecipitation coupled with sequencing (ChIP-Seq) in the presence and absence of 4NQO. In response to 4NQO, all alleles of Yrr1 drove the expression of SNQ2 (a multidrug transporter), which was highest in the presence of 4NQO resistance-conferring alleles, and overexpression of SNQ2 alone was sufficient to overcome 4NQO-sensitive growth. Using shape metrics to refine the ChIP-Seq peaks, Yrr1 strongly associated with three loci including SNQ2. In addition to a known Yrr1 target SNG1, Yrr1 also bound upstream of RPL35B; however, overexpression of these genes did not confer 4NQO resistance. RNA-Seq data also implicated nucleotide synthesis pathways including the de novo purine pathway, and the ribonuclease reductase pathways were downregulated in response to 4NQO. Conversion of a 4NQO-sensitive allele to a 4NQO-resistant allele by a single point mutation mimicked the 4NQO-resistant allele in phenotype, and while the 4NQO resistant allele increased the expression of the ADE genes in the de novo purine biosynthetic pathway, the mutant Yrr1 increased expression of ADE genes even in the absence of 4NQO. These same ADE genes were only increased in the wild-type alleles in the presence of 4NQO, indicating that the point mutation activated Yrr1 to upregulate a pathway normally only activated in response to stress. The various Yrr1 alleles also influenced growth on different carbon sources by altering the function of the mitochondria. Hence, the complement to 4NQO resistance was poor growth on nonfermentable carbon sources, which in turn varied depending on the allele of Yrr1 expressed in the isogenic yeast. The oxidation state of the yeast affected the 4NQO toxicity by altering the reactive oxygen species (ROS) generated by cellular metabolism. The integration of RNA-Seq and ChIP-Seq elucidated how Yrr1 regulates global gene transcription in response to 4NQO and how various Yrr1 alleles confer differential resistance to 4NQO. This study provides guidance for further investigation into how Yrr1 regulates cellular responses to 4NQO, as well as transcriptomic resources for further analysis of transcription factor variation on carbon source utilization. PMID:29208650

  8. Global Analysis of Differentially Expressed Genes and Proteins in the Wheat Callus Infected by Agrobacterium tumefaciens

    PubMed Central

    Zhou, Xiaohong; Wang, Ke; Lv, Dongwen; Wu, Chengjun; Li, Jiarui; Zhao, Pei; Lin, Zhishan; Du, Lipu; Yan, Yueming; Ye, Xingguo

    2013-01-01

    Agrobacterium-mediated plant transformation is an extremely complex and evolved process involving genetic determinants of both the bacteria and the host plant cells. However, the mechanism of the determinants remains obscure, especially in some cereal crops such as wheat, which is recalcitrant for Agrobacterium-mediated transformation. In this study, differentially expressed genes (DEGs) and differentially expressed proteins (DEPs) were analyzed in wheat callus cells co-cultured with Agrobacterium by using RNA sequencing (RNA-seq) and two-dimensional electrophoresis (2-DE) in conjunction with mass spectrometry (MS). A set of 4,889 DEGs and 90 DEPs were identified, respectively. Most of them are related to metabolism, chromatin assembly or disassembly and immune defense. After comparative analysis, 24 of the 90 DEPs were detected in RNA-seq and proteomics datasets simultaneously. In addition, real-time RT-PCR experiments were performed to check the differential expression of the 24 genes, and the results were consistent with the RNA-seq data. According to gene ontology (GO) analysis, we found that a big part of these differentially expressed genes were related to the process of stress or immunity response. Several putative determinants and candidate effectors responsive to Agrobacterium mediated transformation of wheat cells were discussed. We speculate that some of these genes are possibly related to Agrobacterium infection. Our results will help to understand the interaction between Agrobacterium and host cells, and may facilitate developing efficient transformation strategies in cereal crops. PMID:24278131

  9. Obesity modulates inflammation and lipid metabolism oocyte gene expression: A single cell transcriptome perspective

    USDA-ARS?s Scientific Manuscript database

    This study aimed to compare oocyte gene expression profiles and follicular fluid (FF) content from overweight/obese (OW) women and normal weight (NW) women who were undergoing fertility treatments. Using single cell transcriptomic analyses, we investigated oocyte gene expression using RNA-seq. Serum...

  10. Quantification of cytokine mRNA in peripheral blood mononuclear cells using branched DNA (bDNA) technology.

    PubMed

    Shen, L P; Sheridan, P; Cao, W W; Dailey, P J; Salazar-Gonzalez, J F; Breen, E C; Fahey, J L; Urdea, M S; Kolberg, J A

    1998-06-01

    Changes in the patterns of cytokine expression are thought to be of central importance in human infectious and inflammatory diseases. As such, there is a need for precise, reproducible assays for quantification of cytokine mRNA that are amenable to routine use in a clinical setting. In this report, we describe the design and performance of a branched DNA (bDNA) assay for the direct quantification of multiple cytokine mRNA levels in peripheral blood mononuclear cells (PBMCs). Oligonucleotide target probe sets were designed for several human cytokines, including TNFalpha, IL-2, IL-4, IL-6, IL-10, and IFNgamma. The bDNA assay yielded highly reproducible quantification of cytokine mRNAs, exhibited a broad linear dynamic range of over 3-log10, and showed a sensitivity sufficient to measure at least 3000 molecules. The potential clinical utility of the bDNA assay was explored by measuring cytokine mRNA levels in PBMCs from healthy and immunocompromised individuals. Cytokine expression levels in PBMCs from healthy blood donors were found to remain relatively stable over a one-month period of time. Elevated levels of IFNgamma mRNA were detected in PBMCs from HIV-1 seropositive individuals, but no differences in mean levels of TNFalpha or IL-6 mRNA were detected between seropositive and seronegative individuals. By providing a reproducible method for quantification of low abundance transcripts in clinical specimens, the bDNA assay may be useful for studies addressing the role of cytokine expression in disease.

  11. Genetic divergence in the transcriptional engram of chronic alcohol abuse: A laser-capture RNA-seq study of the mouse mesocorticolimbic system.

    PubMed

    Mulligan, Megan K; Mozhui, Khyobeni; Pandey, Ashutosh K; Smith, Maren L; Gong, Suzhen; Ingels, Jesse; Miles, Michael F; Lopez, Marcelo F; Lu, Lu; Williams, Robert W

    2017-02-01

    Genetic factors that influence the transition from initial drinking to dependence remain enigmatic. Recent studies have leveraged chronic intermittent ethanol (CIE) paradigms to measure changes in brain gene expression in a single strain at 0, 8, 72 h, and even 7 days following CIE. We extend these findings using LCM RNA-seq to profile expression in 11 brain regions in two inbred strains - C57BL/6J (B6) and DBA/2J (D2) - 72 h following multiple cycles of ethanol self-administration and CIE. Linear models identified differential expression based on treatment, region, strain, or interactions with treatment. Nearly 40% of genes showed a robust effect (FDR < 0.01) of region, and hippocampus CA1, cortex, bed nucleus stria terminalis, and nucleus accumbens core had the highest number of differentially expressed genes after treatment. Another 8% of differentially expressed genes demonstrated a robust effect of strain. As expected, based on similar studies in B6, treatment had a much smaller impact on expression; only 72 genes (p < 0.01) are modulated by treatment (independent of region or strain). Strikingly, many more genes (415) show a strain-specific and largely opposite response to treatment and are enriched in processes related to RNA metabolism, transcription factor activity, and mitochondrial function. Over 3 times as many changes in gene expression were detected in D2 compared to B6, and weighted gene co-expression network analysis (WGCNA) module comparison identified more modules enriched for treatment effects in D2. Substantial strain differences exist in the temporal pattern of transcriptional neuroadaptation to CIE, and these may drive individual differences in risk of addiction following excessive alcohol consumption. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. Whole-transcriptome brain expression and exon-usage profiling in major depression and suicide: evidence for altered glial, endothelial and ATPase activity

    PubMed Central

    Pantazatos, Spiro P.; Huang, Yung-yu; Rosoklija, Gorazd B.; Dwork, Andrew J.; Arango, Victoria; Mann, J. John

    2016-01-01

    Brain gene expression profiling studies of suicide and depression using oligonucleotide microarrays have often failed to distinguish these two phenotypes. Moreover, next generation sequencing (NGS) approaches are more accurate in quantifying gene expression and can detect alternative splicing. Using RNA-seq, we examined whole-exome gene and exon expression in non-psychiatric controls (CON, N=29), DSM-IV major depressive disorder suicides (MDD-S, N=21) and MDD non-suicides (MDD, N=9) in dorsal lateral prefrontal cortex (Brodmann Area 9) of sudden-death medication-free individuals postmortem. Using small RNA-seq, we also examined miRNA expression (9 samples per group). DeSeq2 identified thirty-five genes differentially expressed between groups and surviving adjustment for false discovery rate (adjusted p<0.1). In depression, altered genes include humanin like-8 (MTRNRL8), interleukin-8 (IL8), and serpin peptidase inhibitor, clade H (SERPINH1) and chemokine ligand 4 (CCL4), while exploratory gene ontology (GO) analyses revealed lower expression of immune-related pathways such as chemokine receptor activity, chemotaxis and cytokine biosynthesis, and angiogenesis and vascular development in (adjusted p<0.1). Hypothesis-driven GO analysis suggests lower expression of genes involved in oligodendrocyte differentiation, regulation of glutamatergic neurotransmission, and oxytocin receptor expression in both suicide and depression, and provisional evidence for altered DNA-dependent ATPase expression in suicide only. DEXSEq analysis identified differential exon usage in ATPase, class II, type 9B (adjusted p<0.1) in depression. Differences in miRNA expression or structural gene variants were not detected. Results lend further support for models in which deficits in microglial, endothelial (blood-brain barrier), ATPase activity and astrocytic cell functions contribute to MDD and suicide, and identify putative pathways and mechanisms for further study in these disorders. PMID:27528462

  13. Whole-transcriptome brain expression and exon-usage profiling in major depression and suicide: evidence for altered glial, endothelial and ATPase activity.

    PubMed

    Pantazatos, S P; Huang, Y-Y; Rosoklija, G B; Dwork, A J; Arango, V; Mann, J J

    2017-05-01

    Brain gene expression profiling studies of suicide and depression using oligonucleotide microarrays have often failed to distinguish these two phenotypes. Moreover, next generation sequencing approaches are more accurate in quantifying gene expression and can detect alternative splicing. Using RNA-seq, we examined whole-exome gene and exon expression in non-psychiatric controls (CON, N=29), DSM-IV major depressive disorder suicides (MDD-S, N=21) and MDD non-suicides (MDD, N=9) in the dorsal lateral prefrontal cortex (Brodmann Area 9) of sudden death medication-free individuals post mortem. Using small RNA-seq, we also examined miRNA expression (nine samples per group). DeSeq2 identified 35 genes differentially expressed between groups and surviving adjustment for false discovery rate (adjusted P<0.1). In depression, altered genes include humanin-like-8 (MTRNRL8), interleukin-8 (IL8), and serpin peptidase inhibitor, clade H (SERPINH1) and chemokine ligand 4 (CCL4), while exploratory gene ontology (GO) analyses revealed lower expression of immune-related pathways such as chemokine receptor activity, chemotaxis and cytokine biosynthesis, and angiogenesis and vascular development in (adjusted P<0.1). Hypothesis-driven GO analysis suggests lower expression of genes involved in oligodendrocyte differentiation, regulation of glutamatergic neurotransmission, and oxytocin receptor expression in both suicide and depression, and provisional evidence for altered DNA-dependent ATPase expression in suicide only. DEXSEq analysis identified differential exon usage in ATPase, class II, type 9B (adjusted P<0.1) in depression. Differences in miRNA expression or structural gene variants were not detected. Results lend further support for models in which deficits in microglial, endothelial (blood-brain barrier), ATPase activity and astrocytic cell functions contribute to MDD and suicide, and identify putative pathways and mechanisms for further study in these disorders.

  14. Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

    PubMed

    Lau, Billy T; Ji, Hanlee P

    2017-09-21

    RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.

  15. Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 User Guide

    EPA Science Inventory

    User Guide to describe the complete functionality of the Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) Version 3.0 online tool. The US Environmental Protection Agency Sequence Alignment to Predict Across Species Susceptibility tool (SeqAPASS; https://seqa...

  16. A resource for characterizing genome-wide binding and putative target genes of transcription factors expressed during secondary growth and wood formation in Populus.

    PubMed

    Liu, Lijun; Ramsay, Trevor; Zinkgraf, Matthew; Sundell, David; Street, Nathaniel Robert; Filkov, Vladimir; Groover, Andrew

    2015-06-01

    Identifying transcription factor target genes is essential for modeling the transcriptional networks underlying developmental processes. Here we report a chromatin immunoprecipitation sequencing (ChIP-seq) resource consisting of genome-wide binding regions and associated putative target genes for four Populus homeodomain transcription factors expressed during secondary growth and wood formation. Software code (programs and scripts) for processing the Populus ChIP-seq data are provided within a publically available iPlant image, including tools for ChIP-seq data quality control and evaluation adapted from the human Encyclopedia of DNA Elements (ENCODE) project. Basic information for each transcription factor (including members of Class I KNOX, Class III HD ZIP, BEL1-like families) binding are summarized, including the number and location of binding regions, distribution of binding regions relative to gene features, associated putative target genes, and enriched functional categories of putative target genes. These ChIP-seq data have been integrated within the Populus Genome Integrative Explorer (PopGenIE) where they can be analyzed using a variety of web-based tools. We present an example analysis that shows preferential binding of transcription factor ARBORKNOX1 to the nearest neighbor genes in a pre-calculated co-expression network module, and enrichment for meristem-related genes within this module including multiple orthologs of Arabidopsis KNOTTED-like Arabidopsis 2/6. © 2015 Society for Experimental Biology and John Wiley & Sons Ltd This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  17. RNA sequencing enables systematic identification of platelet transcriptomic alterations in NSCLC patients.

    PubMed

    Zhang, Qun; Hu, Huan; Liu, Hongda; Jin, Jiajia; Zhu, Peiyuan; Wang, Shujun; Shen, Kaikai; Hu, Yangbo; Li, Zhou; Zhan, Ping; Zhu, Suhua; Fan, Hang; Zhang, Jianya; Lv, Tangfeng; Song, Yong

    2018-05-29

    Platelets are implicated as key players in the metastatic dissemination of tumor cells. Previous evidence demonstrated platelets retained cytoplasmic RNAs with physiologically activity, splicing pre-mRNA to mRNA and translating into functional proteins in response to external stimulation. Recently, platelets gene profile of healthy or diseased individuals were characterized with the help of RNA sequencing (RNA-Seq) in some studies, leading to new insights into the mechanisms underlying disease pathogenesis. In this study, we performed RNA-seq in platelets from 7 healthy individuals and 15 non-small cell lung cancer (NSCLC) patients. Our data revealed a subset of near universal differently expressed gene (DEG) profiles in platelets of metastatic NSCLC compared to healthy individuals, including 626 up-regulated RNAs (mRNAs and ncRNAs) and 1497 down-regulated genes. The significant over-expressed genes showed enrichment in focal adhesion, platelets activation, gap junction and adherens junction pathways. The DEGs also included previously reported tumor-related genes such as PDGFR, VEGF, EGF, etc., verifying the consistence and significance of platelet RNA-Seq in oncology study. We also validated several up-regulated DEGs involved in tumor cell-induced platelet aggregation (TCIPA) and tumorigenesis. Additionally, transcriptomic comparison analyses of NSCLC subgroups were conducted. Between non-metastatic and metastatic NSCLC patients, 526 platelet DEGs were identified with the most altered expression. The outcomes from subgroup analysis between lung adenocarcinoma and lung squamous cell carcinoma demonstrated the diagnostic potential of platelet RNA-Seq on distinguishing tumor histological types. Copyright © 2018 Elsevier Masson SAS. All rights reserved.

  18. ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data.

    PubMed

    Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2017-01-04

    The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. Cocaine dynamically regulates heterochromatin and repetitive element unsilencing in nucleus accumbens.

    PubMed

    Maze, Ian; Feng, Jian; Wilkinson, Matthew B; Sun, HaoSheng; Shen, Li; Nestler, Eric J

    2011-02-15

    Repeated cocaine exposure induces persistent alterations in genome-wide transcriptional regulatory networks, chromatin remodeling activity and, ultimately, gene expression profiles in the brain's reward circuitry. Virtually all previous investigations have centered on drug-mediated effects occurring throughout active euchromatic regions of the genome, with very little known concerning the impact of cocaine exposure on the regulation and maintenance of heterochromatin in adult brain. Here, we report that cocaine dramatically and dynamically alters heterochromatic histone H3 lysine 9 trimethylation (H3K9me3) in the nucleus accumbens (NAc), a key brain reward region. Furthermore, we demonstrate that repeated cocaine exposure causes persistent decreases in heterochromatization in this brain region, suggesting a potential role for heterochromatic regulation in the long-term actions of cocaine. To identify precise genomic loci affected by these alterations, chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-Seq) was performed on NAc. ChIP-Seq analyses confirmed the existence of the H3K9me3 mark mainly within intergenic regions of the genome and identified specific patterns of cocaine-induced H3K9me3 regulation at repetitive genomic sequences. Cocaine-mediated decreases in H3K9me3 enrichment at specific genomic repeats [e.g., long interspersed nuclear element (LINE)-1 repeats] were further confirmed by the increased expression of LINE-1 retrotransposon-associated repetitive elements in NAc. Such increases likely reflect global patterns of genomic destabilization in this brain region after repeated cocaine administration and open the door for future investigations into the epigenetic and genetic basis of drug addiction.

  20. VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis.

    PubMed

    Cornwell, MacIntosh; Vangala, Mahesh; Taing, Len; Herbert, Zachary; Köster, Johannes; Li, Bo; Sun, Hanfei; Li, Taiwen; Zhang, Jian; Qiu, Xintao; Pun, Matthew; Jeselsohn, Rinath; Brown, Myles; Liu, X Shirley; Long, Henry W

    2018-04-12

    RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.

  1. The genetic architecture of gene expression levels in wild baboons.

    PubMed

    Tung, Jenny; Zhou, Xiang; Alberts, Susan C; Stephens, Matthew; Gilad, Yoav

    2015-02-25

    Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.

  2. The genetic architecture of gene expression levels in wild baboons

    PubMed Central

    Tung, Jenny; Zhou, Xiang; Alberts, Susan C; Stephens, Matthew; Gilad, Yoav

    2015-01-01

    Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates. DOI: http://dx.doi.org/10.7554/eLife.04729.001 PMID:25714927

  3. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.

    PubMed

    Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C

    2018-06-01

    Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.

  4. RNA-Seq reveals 10 novel promising candidate genes affecting milk protein concentration in the Chinese Holstein population

    PubMed Central

    Li, Cong; Cai, Wentao; Zhou, Chenghao; Yin, Hongwei; Zhang, Ziqi; Loor, Juan J.; Sun, Dongxiao; Zhang, Qin; Liu, Jianfeng; Zhang, Shengli

    2016-01-01

    Paired-end RNA sequencing (RNA-Seq) was used to explore the bovine transcriptome from the mammary tissue of 12 Chinese Holstein cows with 6 extremely high and 6 low phenotypic values for milk protein percentage. We defined the differentially expressed transcripts between the two comparison groups, extremely high and low milk protein percentage during the peak lactation (HP vs LP) and during the non-lactating period (HD vs LD), respectively. Within the differentially expressed genes (DEGs), we detected 157 at peak lactation and 497 in the non-lactating period with a highly significant correlation with milk protein concentration. Integrated interpretation of differential gene expression indicated that SERPINA1, CLU, CNTFR, ERBB2, NEDD4L, ANG, GALE, HSPA8, LPAR6 and CD14 are the most promising candidate genes affecting milk protein concentration. Similarly, LTF, FCGR3A, MEGF10, RRM2 and UBE2C are the most promising candidates that in the non-lactating period could help the mammary tissue prevent issues with inflammation and udder disorders. Putative genes will be valuable resources for designing better breeding strategies to optimize the content of milk protein and also to provide new insights into regulation of lactogenesis. PMID:27254118

  5. RNA-Seq Profiling Reveals Novel Hepatic Gene Expression Pattern in Aflatoxin B1 Treated Rats

    PubMed Central

    Merrick, B. Alex; Phadke, Dhiral P.; Auerbach, Scott S.; Mav, Deepak; Stiegelmeyer, Suzy M.; Shah, Ruchir R.; Tice, Raymond R.

    2013-01-01

    Deep sequencing was used to investigate the subchronic effects of 1 ppm aflatoxin B1 (AFB1), a potent hepatocarcinogen, on the male rat liver transcriptome prior to onset of histopathological lesions or tumors. We hypothesized RNA-Seq would reveal more differentially expressed genes (DEG) than microarray analysis, including low copy and novel transcripts related to AFB1’s carcinogenic activity compared to feed controls (CTRL). Paired-end reads were mapped to the rat genome (Rn4) with TopHat and further analyzed by DESeq and Cufflinks-Cuffdiff pipelines to identify differentially expressed transcripts, new exons and unannotated transcripts. PCA and cluster analysis of DEGs showed clear separation between AFB1 and CTRL treatments and concordance among group replicates. qPCR of eight high and medium DEGs and three low DEGs showed good comparability among RNA-Seq and microarray transcripts. DESeq analysis identified 1,026 differentially expressed transcripts at greater than two-fold change (p<0.005) compared to 626 transcripts by microarray due to base pair resolution of transcripts by RNA-Seq, probe placement within transcripts or an absence of probes to detect novel transcripts, splice variants and exons. Pathway analysis among DEGs revealed signaling of Ahr, Nrf2, GSH, xenobiotic, cell cycle, extracellular matrix, and cell differentiation networks consistent with pathways leading to AFB1 carcinogenesis, including almost 200 upregulated transcripts controlled by E2f1-related pathways related to kinetochore structure, mitotic spindle assembly and tissue remodeling. We report 49 novel, differentially-expressed transcripts including confirmation by PCR-cloning of two unique, unannotated, hepatic AFB1-responsive transcripts (HAfT’s) on chromosomes 1.q55 and 15.q11, overexpressed by 10 to 25-fold. Several potentially novel exons were found and exon refinements were made including AFB1 exon-specific induction of homologous family members, Ugt1a6 and Ugt1a7c. We find the rat transcriptome contains many previously unidentified, AFB1-responsive exons and transcripts supporting RNA-Seq’s capabilities to provide new insights into AFB1-mediated gene expression leading to hepatocellular carcinoma. PMID:23630614

  6. Dietary fat and fiber interact to uniquely modify global histone post-translational epigenetic programming in a rat colon cancer progression model.

    PubMed

    Triff, Karen; McLean, Mathew W; Callaway, Evelyn; Goldsby, Jennifer; Ivanov, Ivan; Chapkin, Robert S

    2018-04-16

    Dietary fermentable fiber generates short-chain fatty acids (SCFA), e.g., butyrate, in the colonic lumen which serves as a chemoprotective histone deacetylase inhibitor and/or as an acetylation substrate for histone acetylases. In addition, n-3 polyunsaturated fatty acids (n-3 PUFA) in fish oil can affect the chromatin landscape by acting as ligands for tumor suppressive nuclear receptors. In an effort to gain insight into the global dimension of post-translational modification of histones (including H3K4me3 and H3K9ac) and clarify the chemoprotective impact of dietary bioactive compounds on transcriptional control in a preclinical model of colon cancer, we generated high-resolution genome-wide RNA (RNA-Seq) and "chromatin-state" (H3K4me3-seq and H3K9ac-seq) maps for intestinal (epithelial colonocytes) crypts in rats treated with a colon carcinogen and fed diets containing bioactive (i) fish oil, (ii) fermentable fiber (a rich source of SCFA), (iii) a combination of fish oil plus pectin or (iv) control, devoid of fish oil or pectin. In general, poor correlation was observed between differentially transcribed (DE) and enriched genes (DERs) at multiple epigenetic levels. The combinatorial diet (fish oil + pectin) uniquely affected transcriptional profiles in the intestinal epithelium, e.g., upregulating lipid catabolism and beta-oxidation associated genes. These genes were linked to activated ligand-dependent nuclear receptors associated with n-3 PUFA and were also correlated with the mitochondrial L-carnitine shuttle and the inhibition of lipogenesis. These findings demonstrate that the chemoprotective fish oil + pectin combination diet uniquely induces global histone state modifications linked to the expression of chemoprotective genes. This article is protected by copyright. All rights reserved. © 2018 UICC.

  7. Transcriptomic profile analysis of mouse neural tube development by RNA-Seq.

    PubMed

    Yu, Juan; Mu, Jianbing; Guo, Qian; Yang, Lihong; Zhang, Juan; Liu, Zhizhen; Yu, Baofeng; Zhang, Ting; Xie, Jun

    2017-09-01

    The neural tube is the primordium of the central nervous system (CNS) in which its development is not entirely clear. Understanding the cellular and molecular basis of neural tube development could, therefore, provide vital clues to the mechanism of neural tube defects (NTDs). Here, we investigated the gene expression profiles of three different time points (embryonic day (E) 8.5, 9.5 and 10.5) of mouse neural tube by using RNA-seq approach. About 391 differentially expressed genes (DEGs) were screened during mouse neural tube development, including 45 DEGs involved in CNS development, among which Bmp2, Ascl1, Olig2, Lhx1, Wnt7b and Eomes might play the important roles. Of 45 DEGs, Foxp2, Eomes, Hoxb3, Gpr56, Hap1, Nkx2-1, Sez6l2, Wnt7b, Tbx20, Nfib, Cntn1 and Dcx had different isoforms, and the opposite expression pattern of different isoforms was observed for Gpr56, Nkx2-1 and Sez6l2. In addition, alternative splicing, such as mutually exclusive exon, retained intron, skipped exon and alternative 3' splice site was identified in 10 neural related differentially splicing genes, including Ngrn, Ddr1, Dctn1, Dnmt3b, Ect2, Map2, Mbnl1, Meis2, Vcan and App. Moreover, seven neural splicing factors, such as Nova1/2, nSR100/Srrm4, Elavl3/4, Celf3 and Rbfox1 were differentially expressed during mouse neural tube development. Interestingly, nine DEGs identified above were dysregulated in retinoic acid-induced NTDs model, indicating the possible important role of these genes in NTDs. Taken together, our study provides more comprehensive information on mouse neural tube development, which might provide new insights on NTDs occurrence. © 2017 IUBMB Life, 69(9):706-719, 2017. © 2017 International Union of Biochemistry and Molecular Biology.

  8. Discovery of Azurin-Like Anticancer Bacteriocins from Human Gut Microbiome through Homology Modeling and Molecular Docking against the Tumor Suppressor p53.

    PubMed

    Nguyen, Chuong; Nguyen, Van Duy

    2016-01-01

    Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells.

  9. Discovery of Azurin-Like Anticancer Bacteriocins from Human Gut Microbiome through Homology Modeling and Molecular Docking against the Tumor Suppressor p53

    PubMed Central

    Nguyen, Chuong; Nguyen, Van Duy

    2016-01-01

    Azurin from Pseudomonas aeruginosa is known anticancer bacteriocin, which can specifically penetrate human cancer cells and induce apoptosis. We hypothesized that pathogenic and commensal bacteria with long term residence in human body can produce azurin-like bacteriocins as a weapon against the invasion of cancers. In our previous work, putative bacteriocins have been screened from complete genomes of 66 dominant bacteria species in human gut microbiota and subsequently characterized by subjecting them as functional annotation algorithms with azurin as control. We have qualitatively predicted 14 putative bacteriocins that possessed functional properties very similar to those of azurin. In this work, we perform a number of quantitative and structure-based analyses including hydrophobic percentage calculation, structural modeling, and molecular docking study of bacteriocins of interest against protein p53, a cancer target. Finally, we have identified 8 putative bacteriocins that bind p53 in a same manner as p28-azurin and azurin, in which 3 peptides (p1seq16, p2seq20, and p3seq24) shared with our previous study and 5 novel ones (p1seq09, p2seq05, p2seq08, p3seq02, and p3seq17) discovered in the first time. These bacteriocins are suggested for further in vitro tests in different neoplastic line cells. PMID:27239476

  10. RNA-Seq Reveals Dynamic Changes of Gene Expression in Key Stages of Intestine Regeneration in the Sea Cucumber Apostichopus japonicas

    PubMed Central

    Sun, Lina; Yang, Hongsheng; Chen, Muyan; Ma, Deyou; Lin, Chenggang

    2013-01-01

    Background Sea cucumbers (Holothuroidea; Echinodermata) have the capacity to regenerate lost tissues and organs. Although the histological and cytological aspects of intestine regeneration have been extensively studied, little is known of the genetic mechanisms involved. There has, however, been a renewed effort to develop a database of Expressed Sequence Tags (ESTs) in Apostichopus japonicus, an economically-important species that occurs in China. This is important for studies on genetic breeding, molecular markers and special physiological phenomena. We have also constructed a library of ESTs obtained from the regenerative body wall and intestine of A. japonicus. The database has increased to ∼30000 ESTs. Results We used RNA-Seq to determine gene expression profiles associated with intestinal regeneration in A. japonicus at 3, 7, 14 and 21 days post evisceration (dpe). This was compared to profiles obtained from a normally-functioning intestine. Approximately 5 million (M) reads were sequenced in every library. Over 2400 up-regulated genes (>10%) and over 1000 down-regulated genes (∼5%) were observed at 3 and 7dpe (log2Ratio≥1, FDR≤0.001). Specific “Go terms” revealed that the DEGs (Differentially Expressed Genes) performed an important function at every regeneration stage. Besides some expected pathways (for example, Ribosome and Spliceosome pathway term), the “Notch signaling pathway,” the “ECM-receptor interaction” and the “Cytokine-cytokine receptor interaction” were significantly enriched. We also investigated the expression profiles of developmental genes, ECM-associated genes and Cytoskeletal genes. Twenty of the most important differentially expressed genes (DEGs) were verified by Real-time PCR, which resulted in a trend concordance of almost 100% between the two techniques. Conclusion Our studies demonstrated dynamic changes in global gene expression during intestine regeneration and presented a series of candidate genes and enriched pathways that contribute to intestine regeneration in sea cucumbers. This provides a foundation for future studies on the genetics/molecular mechanisms associated with intestine regeneration. PMID:23936330

  11. Transcriptome profiling of the intoxication response of Tenebrio molitor larvae to Bacillus thuringiensis Cry3Aa protoxin.

    PubMed

    Oppert, Brenda; Dowd, Scot E; Bouffard, Pascal; Li, Lewyn; Conesa, Ana; Lorenzen, Marcé D; Toutges, Michelle; Marshall, Jeremy; Huestis, Diana L; Fabrick, Jeff; Oppert, Cris; Jurat-Fuentes, Juan Luis

    2012-01-01

    Bacillus thuringiensis (Bt) crystal (Cry) proteins are effective against a select number of insect pests, but improvements are needed to increase efficacy and decrease time to mortality for coleopteran pests. To gain insight into the Bt intoxication process in Coleoptera, we performed RNA-Seq on cDNA generated from the guts of Tenebrio molitor larvae that consumed either a control diet or a diet containing Cry3Aa protoxin. Approximately 134,090 and 124,287 sequence reads from the control and Cry3Aa-treated groups were assembled into 1,318 and 1,140 contigs, respectively. Enrichment analyses indicated that functions associated with mitochondrial respiration, signalling, maintenance of cell structure, membrane integrity, protein recycling/synthesis, and glycosyl hydrolases were significantly increased in Cry3Aa-treated larvae, whereas functions associated with many metabolic processes were reduced, especially glycolysis, tricarboxylic acid cycle, and fatty acid synthesis. Microarray analysis was used to evaluate temporal changes in gene expression after 6, 12 or 24 h of Cry3Aa exposure. Overall, microarray analysis indicated that transcripts related to allergens, chitin-binding proteins, glycosyl hydrolases, and tubulins were induced, and those related to immunity and metabolism were repressed in Cry3Aa-intoxicated larvae. The 24 h microarray data validated most of the RNA-Seq data. Of the three intoxication intervals, larvae demonstrated more differential expression of transcripts after 12 h exposure to Cry3Aa. Gene expression examined by three different methods in control vs. Cry3Aa-treated larvae at the 24 h time point indicated that transcripts encoding proteins with chitin-binding domain 3 were the most differentially expressed in Cry3Aa-intoxicated larvae. Overall, the data suggest that T. molitor larvae mount a complex response to Cry3Aa during the initial 24 h of intoxication. Data from this study represent the largest genetic sequence dataset for T. molitor to date. Furthermore, the methods in this study are useful for comparative analyses in organisms lacking a sequenced genome.

  12. Transcriptome Profiling of the Intoxication Response of Tenebrio molitor Larvae to Bacillus thuringiensis Cry3Aa Protoxin

    PubMed Central

    Oppert, Brenda; Dowd, Scot E.; Bouffard, Pascal; Li, Lewyn; Conesa, Ana; Lorenzen, Marcé D.; Toutges, Michelle; Marshall, Jeremy; Huestis, Diana L.; Fabrick, Jeff; Oppert, Cris; Jurat-Fuentes, Juan Luis

    2012-01-01

    Bacillus thuringiensis (Bt) crystal (Cry) proteins are effective against a select number of insect pests, but improvements are needed to increase efficacy and decrease time to mortality for coleopteran pests. To gain insight into the Bt intoxication process in Coleoptera, we performed RNA-Seq on cDNA generated from the guts of Tenebrio molitor larvae that consumed either a control diet or a diet containing Cry3Aa protoxin. Approximately 134,090 and 124,287 sequence reads from the control and Cry3Aa-treated groups were assembled into 1,318 and 1,140 contigs, respectively. Enrichment analyses indicated that functions associated with mitochondrial respiration, signalling, maintenance of cell structure, membrane integrity, protein recycling/synthesis, and glycosyl hydrolases were significantly increased in Cry3Aa-treated larvae, whereas functions associated with many metabolic processes were reduced, especially glycolysis, tricarboxylic acid cycle, and fatty acid synthesis. Microarray analysis was used to evaluate temporal changes in gene expression after 6, 12 or 24 h of Cry3Aa exposure. Overall, microarray analysis indicated that transcripts related to allergens, chitin-binding proteins, glycosyl hydrolases, and tubulins were induced, and those related to immunity and metabolism were repressed in Cry3Aa-intoxicated larvae. The 24 h microarray data validated most of the RNA-Seq data. Of the three intoxication intervals, larvae demonstrated more differential expression of transcripts after 12 h exposure to Cry3Aa. Gene expression examined by three different methods in control vs. Cry3Aa-treated larvae at the 24 h time point indicated that transcripts encoding proteins with chitin-binding domain 3 were the most differentially expressed in Cry3Aa-intoxicated larvae. Overall, the data suggest that T. molitor larvae mount a complex response to Cry3Aa during the initial 24 h of intoxication. Data from this study represent the largest genetic sequence dataset for T. molitor to date. Furthermore, the methods in this study are useful for comparative analyses in organisms lacking a sequenced genome. PMID:22558093

  13. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  14. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  15. The Physcomitrella patens gene atlas project: large-scale RNA-seq based expression data.

    PubMed

    Perroud, Pierre-François; Haas, Fabian B; Hiss, Manuel; Ullrich, Kristian K; Alboresi, Alessandro; Amirebrahimi, Mojgan; Barry, Kerrie; Bassi, Roberto; Bonhomme, Sandrine; Chen, Haodong; Coates, Juliet C; Fujita, Tomomichi; Guyon-Debast, Anouchka; Lang, Daniel; Lin, Junyan; Lipzen, Anna; Nogué, Fabien; Oliver, Melvin J; Ponce de León, Inés; Quatrano, Ralph S; Rameau, Catherine; Reiss, Bernd; Reski, Ralf; Ricca, Mariana; Saidi, Younousse; Sun, Ning; Szövényi, Péter; Sreedasyam, Avinash; Grimwood, Jane; Stacey, Gary; Schmutz, Jeremy; Rensing, Stefan A

    2018-07-01

    High-throughput RNA sequencing (RNA-seq) has recently become the method of choice to define and analyze transcriptomes. For the model moss Physcomitrella patens, although this method has been used to help analyze specific perturbations, no overall reference dataset has yet been established. In the framework of the Gene Atlas project, the Joint Genome Institute selected P. patens as a flagship genome, opening the way to generate the first comprehensive transcriptome dataset for this moss. The first round of sequencing described here is composed of 99 independent libraries spanning 34 different developmental stages and conditions. Upon dataset quality control and processing through read mapping, 28 509 of the 34 361 v3.3 gene models (83%) were detected to be expressed across the samples. Differentially expressed genes (DEGs) were calculated across the dataset to permit perturbation comparisons between conditions. The analysis of the three most distinct and abundant P. patens growth stages - protonema, gametophore and sporophyte - allowed us to define both general transcriptional patterns and stage-specific transcripts. As an example of variation of physico-chemical growth conditions, we detail here the impact of ammonium supplementation under standard growth conditions on the protonemal transcriptome. Finally, the cooperative nature of this project allowed us to analyze inter-laboratory variation, as 13 different laboratories around the world provided samples. We compare differences in the replication of experiments in a single laboratory and between different laboratories. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.

  16. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples.

    PubMed

    Reiman, Mario; Laan, Maris; Rull, Kristiina; Sõber, Siim

    2017-08-01

    RNA degradation is a ubiquitous process that occurs in living and dead cells, as well as during handling and storage of extracted RNA. Reduced RNA quality caused by degradation is an established source of uncertainty for all RNA-based gene expression quantification techniques. RNA sequencing is an increasingly preferred method for transcriptome analyses, and dependence of its results on input RNA integrity is of significant practical importance. This study aimed to characterize the effects of varying input RNA integrity [estimated as RNA integrity number (RIN)] on transcript level estimates and delineate the characteristic differences between transcripts that differ in degradation rate. The study used ribodepleted total RNA sequencing data from a real-life clinically collected set ( n = 32) of human solid tissue (placenta) samples. RIN-dependent alterations in gene expression profiles were quantified by using DESeq2 software. Our results indicate that small differences in RNA integrity affect gene expression quantification by introducing a moderate and pervasive bias in expression level estimates that significantly affected 8.1% of studied genes. The rapidly degrading transcript pool was enriched in pseudogenes, short noncoding RNAs, and transcripts with extended 3' untranslated regions. Typical slowly degrading transcripts (median length, 2389 nt) represented protein coding genes with 4-10 exons and high guanine-cytosine content.-Reiman, M., Laan, M., Rull, K., Sõber, S. Effects of RNA integrity on transcript quantification by total RNA sequencing of clinically collected human placental samples. © FASEB.

  17. Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles.

    PubMed

    Gadala-Maria, Daniel; Yaari, Gur; Uduman, Mohamed; Kleinstein, Steven H

    2015-02-24

    Individual variation in germline and expressed B-cell immunoglobulin (Ig) repertoires has been associated with aging, disease susceptibility, and differential response to infection and vaccination. Repertoire properties can now be studied at large-scale through next-generation sequencing of rearranged Ig genes. Accurate analysis of these repertoire-sequencing (Rep-Seq) data requires identifying the germline variable (V), diversity (D), and joining (J) gene segments used by each Ig sequence. Current V(D)J assignment methods work by aligning sequences to a database of known germline V(D)J segment alleles. However, existing databases are likely to be incomplete and novel polymorphisms are hard to differentiate from the frequent occurrence of somatic hypermutations in Ig sequences. Here we develop a Tool for Ig Genotype Elucidation via Rep-Seq (TIgGER). TIgGER analyzes mutation patterns in Rep-Seq data to identify novel V segment alleles, and also constructs a personalized germline database containing the specific set of alleles carried by a subject. This information is then used to improve the initial V segment assignments from existing tools, like IMGT/HighV-QUEST. The application of TIgGER to Rep-Seq data from seven subjects identified 11 novel V segment alleles, including at least one in every subject examined. These novel alleles constituted 13% of the total number of unique alleles in these subjects, and impacted 3% of V(D)J segment assignments. These results reinforce the highly polymorphic nature of human Ig V genes, and suggest that many novel alleles remain to be discovered. The integration of TIgGER into Rep-Seq processing pipelines will increase the accuracy of V segment assignments, thus improving B-cell repertoire analyses.

  18. A demonstration of the H3 trimethylation ChIP-seq analysis of galline follicular mesenchymal cells and male germ cells.

    PubMed

    Chokeshaiusaha, Kaj; Puthier, Denis; Nguyen, Catherine; Sananmuang, Thanida

    2018-06-01

    Trimethylation of histone 3 (H3) at 4th lysine N-termini (H3K4me3) in gene promoter region was the universal marker of active genes specific to cell lineage. On the contrary, coexistence of trimethylation at 27th lysine (H3K27me3) in the same loci-the bivalent H3K4m3/H3K27me3 was known to suspend the gene transcription in germ cells, and could also be inherited to the developed stem cell. In galline species, throughout example of H3K4m3 and H3K27me3 ChIP-seq analysis was still not provided. We therefore designed and demonstrated such procedures using ChIP-seq and mRNA-seq data of chicken follicular mesenchymal cells and male germ cells. Analytical workflow was designed and provided in this study. ChIP-seq and RNA-seq datasets of follicular mesenchymal cells and male germ cells were acquired and properly preprocessed. Peak calling by Model-based analysis of ChIP-seq 2 was performed to identify H3K4m3 or H3K27me3 enriched regions (Fold-change≥2, FDR≤0.01) in gene promoter regions. Integrative genomics viewer was utilized for cellular retinoic acid binding protein 1 ( CRABP1 ), growth differentiation factor 10 ( GDF10 ), and gremlin 1 ( GREM1 ) gene explorations. The acquired results indicated that follicular mesenchymal cells and germ cells shared several unique gene promoter regions enriched with H3K4me3 (5,704 peaks) and also unique regions of bivalent H3K4m3/H3K27me3 shared between all cell types and germ cells (1,909 peaks). Subsequent observation of follicular mesenchyme-specific genes- CRABP1 , GDF10 , and GREM1 correctly revealed vigorous transcriptions of these genes in follicular mesenchymal cells. As expected, bivalent H3K4m3/H3K27me3 pattern was manifested in gene promoter regions of germ cells, and thus suspended their transcriptions. According the results, an example of chicken H3K4m3/H3K27me3 ChIP-seq data analysis was successfully demonstrated in this study. Hopefully, the provided methodology should hereby be useful for galline ChIP-seq data analysis in the future.

  19. Large scale systematic proteomic quantification from non-metastatic to metastatic colorectal cancer

    NASA Astrophysics Data System (ADS)

    Yin, Xuefei; Zhang, Yang; Guo, Shaowen; Jin, Hong; Wang, Wenhai; Yang, Pengyuan

    2015-07-01

    A systematic proteomic quantification of formalin-fixed, paraffin-embedded (FFPE) colorectal cancer tissues from stage I to stage IIIC was performed in large scale. 1017 proteins were identified with 338 proteins in quantitative changes by label free method, while 341 proteins were quantified with significant expression changes among 6294 proteins by iTRAQ method. We found that proteins related to migration expression increased and those for binding and adherent decreased during the colorectal cancer development according to the gene ontology (GO) annotation and ingenuity pathway analysis (IPA). The integrin alpha 5 (ITA5) in integrin family was focused, which was consistent with the metastasis related pathway. The expression level of ITA5 decreased in metastasis tissues and the result has been further verified by Western blotting. Another two cell migration related proteins vitronectin (VTN) and actin-related protein (ARP3) were also proved to be up-regulated by both mass spectrometry (MS) based quantification results and Western blotting. Up to now, our result shows one of the largest dataset in colorectal cancer proteomics research. Our strategy reveals a disease driven omics-pattern for the metastasis colorectal cancer.

  20. Transcriptomic profiling provides molecular insights into hydrogen peroxide-induced adventitious rooting in mung bean seedlings.

    PubMed

    Li, Shi-Weng; Leng, Yan; Shi, Rui-Fang

    2017-02-17

    Hydrogen peroxide (H 2 O 2 ) has been known to function as a signalling molecule involved in the modulation of various physiological processes in plants. H 2 O 2 has been shown to act as a promoter during adventitious root formation in hypocotyl cuttings. In this study, RNA-Seq was performed to reveal the molecular mechanisms underlying H 2 O 2 -induced adventitious rooting. RNA-Seq data revealed that H 2 O 2 treatment greatly increased the numbers of clean reads and expressed genes and abundance of gene expression relative to the water treatment. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses indicated that a profound change in gene function occurred in the 6-h H 2 O 2 treatment and that H 2 O 2 mainly enhanced gene expression levels at the 6-h time point but reduced gene expression levels at the 24-h time point compared with the water treatment. In total, 4579 differentially expressed (2-fold change > 2) unigenes (DEGs), of which 78.3% were up-regulated and 21.7% were down-regulated; 3525 DEGs, of which 64.0% were up-regulated and 36.0% were down-regulated; and 7383 DEGs, of which 40.8% were up-regulated and 59.2% were down-regulated were selected in the 6-h, 24-h, and from 6- to 24-h treatments, respectively. The number of DEGs in the 6-h treatment was 29.9% higher than that in the 24-h treatment. The functions of the most highly regulated genes were associated with stress response, cell redox homeostasis and oxidative stress response, cell wall loosening and modification, metabolic processes, and transcription factors (TFs), as well as plant hormone signalling, including auxin, ethylene, cytokinin, gibberellin, and abscisic acid pathways. Notably, a large number of genes encoding for heat shock proteins (HSPs) and heat shock transcription factors (HSFs) were significantly up-regulated during H 2 O 2 treatments. Furthermore, real-time quantitative PCR (qRT-PCR) results showed that, during H 2 O 2 treatments, the expression levels of ARFs, IAAs, AUXs, NACs, RD22, AHKs, MYBs, PIN1, AUX15A, LBD29, LBD41, ADH1b, and QORL were significantly up-regulated at the 6- and/or 24-h time points. In contrast, PER1 and PER2 were significantly down-regulated by H 2 O 2 treatment. These qRT-PCR results strongly correlated with the RNA-Seq data. Using RNA-Seq and qRT-PCR techniques, we analysed the global changes in gene expression and functional profiling during H 2 O 2 -induced adventitious rooting in mung bean seedlings. These results strengthen the current understanding of H 2 O 2 -induced adventitious rooting and the molecular traits of H 2 O 2 priming in plants.

Top