Science.gov

Sample records for a-rich rna sequences

  1. Translational control by lysine-encoding A-rich sequences

    PubMed Central

    Arthur, Laura L.; Pavlovic-Djuranovic, Slavica; Koutmou, Kristin S.; Green, Rachel; Szczesny, Pawel; Djuranovic, Sergej

    2015-01-01

    Regulation of gene expression involves a wide array of cellular mechanisms that control the abundance of the RNA or protein products of that gene. We describe a gene regulatory mechanism that is based on polyadenylate [poly(A)] tracks that stall the translation apparatus. We show that creating longer or shorter runs of adenosine nucleotides, without changes in the amino acid sequence, alters the protein output and the stability of mRNA. Sometimes, these changes result in the production of an alternative “frameshifted” protein product. These observations are corroborated using reporter constructs and in the context of recombinant gene sequences. About 2% of genes in the human genome may be subject to this uncharacterized yet fundamental form of gene regulation. The potential pool of regulated genes encodes many proteins involved in nucleic acid binding. We hypothesize that the genes we identify are part of a large network whose expression is fine-tuned by poly(A) tracks, and we provide a mechanism through which synonymous mutations may influence gene expression in pathological states. PMID:26322332

  2. Nuclear RNA Isolation and Sequencing.

    PubMed

    Dhaliwal, Navroop K; Mitchell, Jennifer A

    2016-01-01

    Most transcriptome studies involve sequencing and quantification of steady-state mRNA by isolating and sequencing poly (A) RNA. Although this type of sequencing data is informative to determine steady-state mRNA levels it does not provide information on transcriptional output and thus may not always reflect changes in transcriptional regulation of gene expression. Furthermore, sequencing poly (A) RNA may miss transcribed regions of the genome not usually modified by polyadenylation which includes many long noncoding RNAs. Here, we describe nuclear-RNA sequencing (nucRNA-seq) which investigates the transcriptional landscape through sequencing and quantification of nuclear RNAs which are both unspliced and spliced transcripts for protein-coding genes and nuclear-retained long noncoding RNAs.

  3. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  4. AMPLIFICATION OF RIBOSOMAL RNA SEQUENCES

    EPA Science Inventory

    This book chapter offers an overview of the use of ribosomal RNA sequences. A history of the technology traces the evolution of techniques to measure bacterial phylogenetic relationships and recent advances in obtaining rRNA sequence information. The manual also describes procedu...

  5. Compilation of small RNA sequences.

    PubMed

    Shumyatsky, G; Reddy, R

    1992-05-11

    This is an update containing small RNA sequences published during 1991. Approximately two hundred small RNA sequences are available in this and earlier compilations. The hard copy print out of this set will be available directly from us (inquiries should be addressed to R. Reddy). These files are also available on GenBank computer. Sequences from various sources covered in earlier compilations (see Reddy, R. Nucl. Acids Res. 16:r71; Reddy, R. and Gupta, S. Nucl Acids Res. 1990 Supplement, 18:2231 and 1991 Supplement, 19:2073) are not included in this update but are listed below.

  6. Deciphering the RNA landscape by RNAome sequencing

    PubMed Central

    Derks, Kasper WJ; Misovic, Branislav; van den Hout, Mirjam CGN; Kockx, Christel EM; Payan Gomez, Cesar; Brouwer, Rutger WW; Vrieling, Harry; Hoeijmakers, Jan HJ; van IJcken, Wilfred FJ; Pothof, Joris

    2015-01-01

    Current RNA expression profiling methods rely on enrichment steps for specific RNA classes, thereby not detecting all RNA species in an unperturbed manner. We report strand-specific RNAome sequencing that determines expression of small and large RNAs from rRNA-depleted total RNA in a single sequence run. Since current analysis pipelines cannot reliably analyze small and large RNAs simultaneously, we developed TRAP, Total Rna Analysis Pipeline, a robust interface that is also compatible with existing RNA sequencing protocols. RNAome sequencing quantitatively preserved all RNA classes, allowing cross-class comparisons that facilitates the identification of relationships between different RNA classes. We demonstrate the strength of RNAome sequencing in mouse embryonic stem cells treated with cisplatin. MicroRNA and mRNA expression in RNAome sequencing significantly correlated between replicates and was in concordance with both existing RNA sequencing methods and gene expression arrays generated from the same samples. Moreover, RNAome sequencing also detected additional RNA classes such as enhancer RNAs, anti-sense RNAs, novel RNA species and numerous differentially expressed RNAs undetectable by other methods. At the level of complete RNA classes, RNAome sequencing also identified a specific global repression of the microRNA and microRNA isoform classes after cisplatin treatment whereas all other classes such as mRNAs were unchanged. These characteristics of RNAome sequencing will significantly improve expression analysis as well as studies on RNA biology not covered by existing methods. PMID:25826412

  7. RNA sequence analysis using covariance models.

    PubMed Central

    Eddy, S R; Durbin, R

    1994-01-01

    We describe a general approach to several RNA sequence analysis problems using probabilistic models that flexibly describe the secondary structure and primary sequence consensus of an RNA sequence family. We call these models 'covariance models'. A covariance model of tRNA sequences is an extremely sensitive and discriminative tool for searching for additional tRNAs and tRNA-related sequences in sequence databases. A model can be built automatically from an existing sequence alignment. We also describe an algorithm for learning a model and hence a consensus secondary structure from initially unaligned example sequences and no prior structural information. Models trained on unaligned tRNA examples correctly predict tRNA secondary structure and produce high-quality multiple alignments. The approach may be applied to any family of small RNA sequences. Images PMID:8029015

  8. antaRNA: ant colony-based RNA sequence design

    PubMed Central

    Kleinkauf, Robert; Mann, Martin; Backofen, Rolf

    2015-01-01

    Motivation: RNA sequence design is studied at least as long as the classical folding problem. Although for the latter the functional fold of an RNA molecule is to be found, inverse folding tries to identify RNA sequences that fold into a function-specific target structure. In combination with RNA-based biotechnology and synthetic biology, reliable RNA sequence design becomes a crucial step to generate novel biochemical components. Results: In this article, the computational tool antaRNA is presented. It is capable of compiling RNA sequences for a given structure that comply in addition with an adjustable full range objective GC-content distribution, specific sequence constraints and additional fuzzy structure constraints. antaRNA applies ant colony optimization meta-heuristics and its superior performance is shown on a biological datasets. Availability and implementation: http://www.bioinf.uni-freiburg.de/Software/antaRNA Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26023105

  9. Experimental investigation of an RNA sequence space

    NASA Astrophysics Data System (ADS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-12-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs. This approach will allow direct study of the constraints governing RNA evolution and allow inquiry into how the last common ancestor of extant life apparently came to have very complex ribosomal RNAs that subsequently were very conserved.

  10. Experimental investigation of an RNA sequence space.

    PubMed

    Lee, Y H; Dsouza, L; Fox, G E

    1993-12-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs. This approach will allow direct study of the constraints governing RNA evolution and allow inquiry into how the last common ancestor of extant life apparently came to have very complex ribosomal RNAs that subsequently were very conserved.

  11. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  12. Experimental investigation of an RNA sequence space

    NASA Technical Reports Server (NTRS)

    Lee, Youn-Hyung; Dsouza, Lisa; Fox, George E.

    1993-01-01

    Modern rRNAs are the historic consequence of an ongoing evolutionary exploration of a sequence space. These extant sequences belong to a special subset of the sequence space that is comprised only of those primary sequences that can validly perform the biological function(s) required of the particular RNA. If it were possible to readily identify all such valid sequences, stochastic predictions could be made about the relative likelihood of various evolutionary pathways available to an RNA. Herein an experimental system which can assess whether a particular sequence is likely to have validity as a eubacterial 5S rRNA is described. A total of ten naturally occurring, and hence known to be valid, sequences and two point mutants of unknown validity were used to test the usefulness of the approach. Nine of the ten valid sequences tested positive whereas both mutants tested as clearly defective. The tenth valid sequence gave results that would be interpreted as reflecting a borderline status were the answer not known. These results demonstrate that it is possible to experimentally determine which sequences in local regions of the sequence space are potentially valid 5S rRNAs.

  13. Analysis of Pteridium ribosomal RNA sequences by rapid direct sequencing.

    PubMed

    Tan, M K

    1991-08-01

    A total of 864 bases from 5 regions interspersed in the 18S and 26S rRNA molecules from various clones of Pteridium covering the general geographical distribution of the genus was analysed using a rapid rRNA sequencing technique. No base difference has been detected amongst the three major lineages, two of which apparently separated before the breakup of the ancient supercontinent, Pangaea. These regions of the rRNA sequences have thus been conserved for at least 160 million years and are here compared with other eukaryotic, especially plant rRNAs.

  14. Discovering New Biology through Sequencing of RNA.

    PubMed

    Weber, Andreas P M

    2015-11-01

    Sequencing of RNA (RNA-Seq) was invented approximately 1 decade ago and has since revolutionized biological research. This update provides a brief historic perspective on the development of RNA-Seq and then focuses on the application of RNA-Seq in qualitative and quantitative analyses of transcriptomes. Particular emphasis is given to aspects of data analysis. Since the wet-lab and data analysis aspects of RNA-Seq are still rapidly evolving and novel applications are continuously reported, a printed review will be rapidly outdated and can only serve to provide some examples and general guidelines for planning and conducting RNA-Seq studies. Hence, selected references to frequently update online resources are given. © 2015 American Society of Plant Biologists. All Rights Reserved.

  15. Alternative applications for distinct RNA sequencing strategies

    PubMed Central

    Han, Leng; Vickers, Kasey C.; Samuels, David C.

    2015-01-01

    Recent advances in RNA library preparation methods, platform accessibility and cost efficiency have allowed high-throughput RNA sequencing (RNAseq) to replace conventional hybridization microarray platforms as the method of choice for mRNA profiling and transcriptome analyses. RNAseq is a powerful technique to profile both long and short RNA expression, and the depth of information gained from distinct RNAseq methods is striking and facilitates discovery. In addition to expression analysis, distinct RNAseq approaches also allow investigators the ability to assess transcriptional elongation, DNA variance and exogenous RNA content. Here we review the current state of the art in transcriptome sequencing and address epigenetic regulation, quantification of transcription activation, RNAseq output and a diverse set of applications for RNAseq data. We detail how RNAseq can be used to identify allele-specific expression, single-nucleotide polymorphisms and somatic mutations and discuss the benefits and limitations of using RNAseq to monitor DNA characteristics. Moreover, we highlight the power of combining RNA- and DNAseq methods for genomic analysis. In summary, RNAseq provides the opportunity to gain greater insight into transcriptional regulation and output than simply miRNA and mRNA profiling. PMID:25246237

  16. Sequence Fingerprints of MicroRNA Conservation

    PubMed Central

    Shi, Bing; Gao, Wei; Wang, Juan

    2012-01-01

    It is known that the conservation of protein-coding genes is associated with their sequences both various species, such as animals and plants. However, the association between microRNA (miRNA) conservation and their sequences in various species remains unexplored. Here we report the association of miRNA conservation with its sequence features, such as base content and cleavage sites, suggesting that miRNA sequences contain the fingerprints for miRNA conservation. More interestingly, different species show different and even opposite patterns between miRNA conservation and sequence features. For example, mammalian miRNAs show a positive/negative correlation between conservation and AU/GC content, whereas plant miRNAs show a negative/positive correlation between conservation and AU/GC content. Further analysis puts forward the hypothesis that the introns of protein-coding genes may be a main driving force for the origin and evolution of mammalian miRNAs. At the 5′ end, conserved miRNAs have a preference for base U, while less-conserved miRNAs have a preference for a non-U base in mammals. This difference does not exist in insects and plants, in which both conserved miRNAs and less-conserved miRNAs have a preference for base U at the 5′ end. We further revealed that the non-U preference at the 5′ end of less-conserved mammalian miRNAs is associated with miRNA function diversity, which may have evolved from the pressure of a highly sophisticated environmental stimulus the mammals encountered during evolution. These results indicated that miRNA sequences contain the fingerprints for conservation, and these fingerprints vary according to species. More importantly, the results suggest that although species share common mechanisms by which miRNAs originate and evolve, mammals may develop a novel mechanism for miRNA origin and evolution. In addition, the fingerprint found in this study can be predictor of miRNA conservation, and the findings are helpful in achieving a

  17. Epitranscriptome sequencing technologies: decoding RNA modifications.

    PubMed

    Li, Xiaoyu; Xiong, Xushen; Yi, Chengqi

    2016-12-29

    In recent years, major breakthroughs in RNA-modification-mediated regulation of gene expression have been made, leading to the emerging field of epitranscriptomics.Our understanding of the distribution, regulation and function of these dynamic RNA modifications is based on sequencing technologies. In this Review, we focus on the major mRNA modifications in the transcriptome of eukaryotic cells: N6-methyladenosine, N6, 2'-O-dimethyladenosine, 5-methylcytidine, 5-hydroxylmethylcytidine, inosine, pseudouridine and N(1)-methyladenosine. We discuss the sequencing technologies used to profile these epitranscriptomic marks, including scale, resolution, quantitative feature, pre-enrichment capability and the corresponding bioinformatics tools. We also discuss the challenges of epitranscriptome profiling and highlight the prospect of future detection tools. We aim to guide the choice of different detection methods and inspire new ideas in RNA biology.

  18. RNA-RNA interaction prediction based on multiple sequence alignments.

    PubMed

    Li, Andrew X; Marz, Manja; Qin, Jing; Reidys, Christian M

    2011-02-15

    Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N(6)) time and O(N(4)) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both: thermodynamic stability as well as sequence/structure covariation. We present the a priori folding algorithm ripalign, whose input consists of two (given) multiple sequence alignments (MSA). ripalign outputs (i) the partition function, (ii) base pairing probabilities, (iii) hybrid probabilities and (iv) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm rip, ripalign requires negligible additional memory resource but offers much better sensitivity and specificity, once alignments of suitable quality are given. ripalign additionally allows to incorporate structure constraints as input parameters. The algorithm described here is implemented in C as part of the rip package.

  19. Probabilistic error correction for RNA sequencing.

    PubMed

    Le, Hai-Son; Schulz, Marcel H; McCauley, Brenna M; Hinman, Veronica F; Bar-Joseph, Ziv

    2013-05-01

    Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)-based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/.

  20. Probabilistic error correction for RNA sequencing

    PubMed Central

    Le, Hai-Son; Schulz, Marcel H.; McCauley, Brenna M.; Hinman, Veronica F.; Bar-Joseph, Ziv

    2013-01-01

    Sequencing of RNAs (RNA-Seq) has revolutionized the field of transcriptomics, but the reads obtained often contain errors. Read error correction can have a large impact on our ability to accurately assemble transcripts. This is especially true for de novo transcriptome analysis, where a reference genome is not available. Current read error correction methods, developed for DNA sequence data, cannot handle the overlapping effects of non-uniform abundance, polymorphisms and alternative splicing. Here we present SEquencing Error CorrEction in Rna-seq data (SEECER), a hidden Markov Model (HMM)–based method, which is the first to successfully address these problems. SEECER efficiently learns hundreds of thousands of HMMs and uses these to correct sequencing errors. Using human RNA-Seq data, we show that SEECER greatly improves on previous methods in terms of quality of read alignment to the genome and assembly accuracy. To illustrate the usefulness of SEECER for de novo transcriptome studies, we generated new RNA-Seq data to study the development of the sea cucumber Parastichopus parvimensis. Our corrected assembled transcripts shed new light on two important stages in sea cucumber development. Comparison of the assembled transcripts to known transcripts in other species has also revealed novel transcripts that are unique to sea cucumber, some of which we have experimentally validated. Supporting website: http://sb.cs.cmu.edu/seecer/. PMID:23558750

  1. Detection theory in identification of RNA-DNA sequence differences using RNA-sequencing.

    PubMed

    Toung, Jonathan M; Lahens, Nicholas; Hogenesch, John B; Grant, Gregory

    2014-01-01

    Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters.

  2. Size and distribution of polyadenylic acid sequences in Drosophila polytene DNA and RNA.

    PubMed

    Alonso, C; Pages, M; García, M L

    1977-12-02

    [3H]Poly(U) hybridizes very rapidly to polytene DNA from Drosophila hydei. When hybridization is performed at 30 degrees C in 2 X SSC to a large excess of DNA, 95% of the poly(U) becomes ribonuclease resistant. Also, complementary RNA transcribed in vitro from polytene DNA hybridizes to poly(U). 023--0.25% of the DNA is composed of (dA)-rich sequences and 0.23--0.31% of cRNA hybridizes to [3H]poly(U). The length of the (dA)-rich sequences on the DNA and cRNA is 40 nucleotides. The Tm values of these hybrids formed between DNA or cRNA-poly(U) is 45 degrees C. The poly(A) fragments from cytoplasmic RNA ranged from 80 to 170 nucleotides in lenght, and migrated in polyacrilamide gels as a broad peak. The average sizes of the poly(A) fragments from the poly(A)-containing RNA transcribed by nuclei isolated from salivary glands in vivo or in vitro were 40, 70, 170 and 70 nucleotides, respectively. Hybridization in situ of [3H]-poly(U) to chromosome squashes indicated that the (dA)-rich sequences are randomly distributed over the whole genome.

  3. Predicting pseudoknotted structures across two RNA sequences

    PubMed Central

    Sperschneider, Jana; Datta, Amitava; Wise, Michael J.

    2012-01-01

    Motivation: Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results: A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability: DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact: janaspe@csse.uwa.edu.au Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23044552

  4. Ribosomal RNA sequence suggest microsporidia are extremely ancient eukaryotes

    NASA Technical Reports Server (NTRS)

    Vossbrinck, C. R.; Maddox, J. V.; Friedman, S.; Debrunner-Vossbrinck, B. A.; Woese, C. R.

    1987-01-01

    A comparative sequence analysis of the 18S small subunit ribosomal RNA (rRNA) of the microsporidium Vairimorpha necatrix is presented. The results show that this rRNA sequence is more unlike those of other eukaryotes than any known eukaryote rRNA sequence. It is concluded that the lineage leading to microsporidia branched very early from that leading to other eukaryotes.

  5. Ribosomal RNA sequence suggest microsporidia are extremely ancient eukaryotes

    NASA Technical Reports Server (NTRS)

    Vossbrinck, C. R.; Maddox, J. V.; Friedman, S.; Debrunner-Vossbrinck, B. A.; Woese, C. R.

    1987-01-01

    A comparative sequence analysis of the 18S small subunit ribosomal RNA (rRNA) of the microsporidium Vairimorpha necatrix is presented. The results show that this rRNA sequence is more unlike those of other eukaryotes than any known eukaryote rRNA sequence. It is concluded that the lineage leading to microsporidia branched very early from that leading to other eukaryotes.

  6. Polyadenylation of RNA transcribed from mammalian SINEs by RNA polymerase III: Complex requirements for nucleotide sequences.

    PubMed

    Borodulina, Olga R; Golubchikova, Julia S; Ustyantsev, Ilia G; Kramerov, Dmitri A

    2016-02-01

    It is generally accepted that only transcripts synthesized by RNA polymerase II (e.g., mRNA) were subject to AAUAAA-dependent polyadenylation. However, we previously showed that RNA transcribed by RNA polymerase III (pol III) from mouse B2 SINE could be polyadenylated in an AAUAAA-dependent manner. Many species of mammalian SINEs end with the pol III transcriptional terminator (TTTTT) and contain hexamers AATAAA in their A-rich tail. Such SINEs were united into Class T(+), whereas SINEs lacking the terminator and AATAAA sequences were classified as T(-). Here we studied the structural features of SINE pol III transcripts that are necessary for their polyadenylation. Eight and six SINE families from classes T(+) and T(-), respectively, were analyzed. The replacement of AATAAA with AACAAA in T(+) SINEs abolished the RNA polyadenylation. Interestingly, insertion of the polyadenylation signal (AATAAA) and pol III transcription terminator in T(-) SINEs did not result in polyadenylation. The detailed analysis of three T(+) SINEs (B2, DIP, and VES) revealed areas important for the polyadenylation of their pol III transcripts: the polyadenylation signal and terminator in A-rich tail, β region positioned immediately downstream of the box B of pol III promoter, and τ region located upstream of the tail. In DIP and VES (but not in B2), the τ region is a polypyrimidine motif which is also characteristic of many other T(+) SINEs. Most likely, SINEs of different mammals acquired these structural features independently as a result of parallel evolution. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. Advanced Applications of RNA Sequencing and Challenges

    PubMed Central

    Han, Yixing; Gao, Shouguo; Muegge, Kathrin; Zhang, Wei; Zhou, Bing

    2015-01-01

    Next-generation sequencing technologies have revolutionarily advanced sequence-based research with the advantages of high-throughput, high-sensitivity, and high-speed. RNA-seq is now being used widely for uncovering multiple facets of transcriptome to facilitate the biological applications. However, the large-scale data analyses associated with RNA-seq harbors challenges. In this study, we present a detailed overview of the applications of this technology and the challenges that need to be addressed, including data preprocessing, differential gene expression analysis, alternative splicing analysis, variants detection and allele-specific expression, pathway analysis, co-expression network analysis, and applications combining various experimental procedures beyond the achievements that have been made. Specifically, we discuss essential principles of computational methods that are required to meet the key challenges of the RNA-seq data analyses, development of various bioinformatics tools, challenges associated with the RNA-seq applications, and examples that represent the advances made so far in the characterization of the transcriptome. PMID:26609224

  8. Detection Theory in Identification of RNA-DNA Sequence Differences Using RNA-Sequencing

    PubMed Central

    Toung, Jonathan M.; Lahens, Nicholas; Hogenesch, John B.; Grant, Gregory

    2014-01-01

    Advances in sequencing technology have allowed for detailed analyses of the transcriptome at single-nucleotide resolution, facilitating the study of RNA editing or sequence differences between RNA and DNA genome-wide. In humans, two types of post-transcriptional RNA editing processes are known to occur: A-to-I deamination by ADAR and C-to-U deamination by APOBEC1. In addition to these sequence differences, researchers have reported the existence of all 12 types of RNA-DNA sequence differences (RDDs); however, the validity of these claims is debated, as many studies claim that technical artifacts account for the majority of these non-canonical sequence differences. In this study, we used a detection theory approach to evaluate the performance of RNA-Sequencing (RNA-Seq) and associated aligners in accurately identifying RNA-DNA sequence differences. By generating simulated RNA-Seq datasets containing RDDs, we assessed the effect of alignment artifacts and sequencing error on the sensitivity and false discovery rate of RDD detection. Overall, we found that even in the presence of sequencing errors, false negative and false discovery rates of RDD detection can be contained below 10% with relatively lenient thresholds. We also assessed the ability of various filters to target false positive RDDs and found them to be effective in discriminating between true and false positives. Lastly, we used the optimal thresholds we identified from our simulated analyses to identify RDDs in a human lymphoblastoid cell line. We found approximately 6,000 RDDs, the majority of which are A-to-G edits and likely to be mediated by ADAR. Moreover, we found the majority of non A-to-G RDDs to be associated with poorer alignments and conclude from these results that the evidence for widespread non-canonical RDDs in humans is weak. Overall, we found RNA-Seq to be a powerful technique for surveying RDDs genome-wide when coupled with the appropriate thresholds and filters. PMID:25396741

  9. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  10. Studies of RNA Sequence and Structure Using Nanopores

    PubMed Central

    Henley, Robert Y.; Carson, Spencer; Wanunu, Meni

    2016-01-01

    Nanopores are powerful single-molecule sensors with nanometer scale dimensions suitable for detection, quantification, and characterization of nucleic acids and proteins. Beyond sequencing applications, both biological and solid-state nanopores hold great promise as tools for studying the biophysical properties of RNA. In this review, we highlight selected landmark nanopore studies with regards to RNA sequencing, microRNA detection, RNA/ligand interactions, and RNA structural/conformational analysis. PMID:26970191

  11. Studies of RNA Sequence and Structure Using Nanopores.

    PubMed

    Henley, Robert Y; Carson, Spencer; Wanunu, Meni

    2016-01-01

    Nanopores are powerful single-molecule sensors with nanometer scale dimensions suitable for detection, quantification, and characterization of nucleic acids and proteins. Beyond sequencing applications, both biological and solid-state nanopores hold great promise as tools for studying the biophysical properties of RNA. In this review, we highlight selected landmark nanopore studies with regards to RNA sequencing, microRNA detection, RNA/ligand interactions, and RNA structural/conformational analysis. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. RNA-sequencing from single nuclei.

    PubMed

    Grindberg, Rashel V; Yee-Greenbaum, Joyclyn L; McConnell, Michael J; Novotny, Mark; O'Shaughnessy, Andy L; Lambert, Georgina M; Araúzo-Bravo, Marcos J; Lee, Jun; Fishman, Max; Robbins, Gillian E; Lin, Xiaoying; Venepally, Pratap; Badger, Jonathan H; Galbraith, David W; Gage, Fred H; Lasken, Roger S

    2013-12-03

    It has recently been established that synthesis of double-stranded cDNA can be done from a single cell for use in DNA sequencing. Global gene expression can be quantified from the number of reads mapping to each gene, and mutations and mRNA splicing variants determined from the sequence reads. Here we demonstrate that this method of transcriptomic analysis can be done using the extremely low levels of mRNA in a single nucleus, isolated from a mouse neural progenitor cell line and from dissected hippocampal tissue. This method is characterized by excellent coverage and technical reproducibility. On average, more than 16,000 of the 24,057 mouse protein-coding genes were detected from single nuclei, and the amount of gene-expression variation was similar when measured between single nuclei and single cells. Several major advantages of the method exist: first, nuclei, compared with whole cells, have the advantage of being easily isolated from complex tissues and organs, such as those in the CNS. Second, the method can be widely applied to eukaryotic species, including those of different kingdoms. The method also provides insight into regulatory mechanisms specific to the nucleus. Finally, the method enables dissection of regulatory events at the single-cell level; pooling of 10 nuclei or 10 cells obscures some of the variability measured in transcript levels, implying that single nuclei and cells will be extremely useful in revealing the physiological state and interconnectedness of gene regulation in a manner that avoids the masking inherent to conventional transcriptomics using bulk cells or tissues.

  13. RNA-sequencing from single nuclei

    PubMed Central

    Grindberg, Rashel V.; Yee-Greenbaum, Joyclyn L.; McConnell, Michael J.; Novotny, Mark; O’Shaughnessy, Andy L.; Lambert, Georgina M.; Araúzo-Bravo, Marcos J.; Lee, Jun; Fishman, Max; Robbins, Gillian E.; Lin, Xiaoying; Venepally, Pratap; Badger, Jonathan H.; Galbraith, David W.; Gage, Fred H.; Lasken, Roger S.

    2013-01-01

    It has recently been established that synthesis of double-stranded cDNA can be done from a single cell for use in DNA sequencing. Global gene expression can be quantified from the number of reads mapping to each gene, and mutations and mRNA splicing variants determined from the sequence reads. Here we demonstrate that this method of transcriptomic analysis can be done using the extremely low levels of mRNA in a single nucleus, isolated from a mouse neural progenitor cell line and from dissected hippocampal tissue. This method is characterized by excellent coverage and technical reproducibility. On average, more than 16,000 of the 24,057 mouse protein-coding genes were detected from single nuclei, and the amount of gene-expression variation was similar when measured between single nuclei and single cells. Several major advantages of the method exist: first, nuclei, compared with whole cells, have the advantage of being easily isolated from complex tissues and organs, such as those in the CNS. Second, the method can be widely applied to eukaryotic species, including those of different kingdoms. The method also provides insight into regulatory mechanisms specific to the nucleus. Finally, the method enables dissection of regulatory events at the single-cell level; pooling of 10 nuclei or 10 cells obscures some of the variability measured in transcript levels, implying that single nuclei and cells will be extremely useful in revealing the physiological state and interconnectedness of gene regulation in a manner that avoids the masking inherent to conventional transcriptomics using bulk cells or tissues. PMID:24248345

  14. Nucleotide sequence of Neurospora crassa cytoplasmic initiator tRNA.

    PubMed Central

    Gillum, A M; Hecker, L I; Silberklang, M; Schwartzbach, S D; RajBhandary, U L; Barnett, W E

    1977-01-01

    Initiator methionine tRNA from the cytoplasm of Neurospora crassa has been purified and sequenced. The sequence is: pAGCUGCAUm1GGCGCAGCGGAAGCGCM22GCY*GGGCUCAUt6AACCCGGAGm7GU (or D) - CACUCGAUCGm1AAACGAG*UUGCAGCUACCAOH. Similar to initiator tRNAs from the cytoplasm of other eukaryotes, this tRNA also contains the sequence -AUCG- instead of the usual -TphiCG (or A)- found in loop IV of other tRNAs. The sequence of the N. crassa cytoplasmic initiator tRNA is quite different from that of the corresponding mitochondrial initiator tRNA. Comparison of the sequence of N. crassa cytoplasmic initiator tRNA to those of yeast, wheat germ and vertebrate cytoplasmic initiator tRNA indicates that the sequences of the two fungal tRNAs are no more similar to each other than they are to those of other initiator tRNAs. Images PMID:146192

  15. Short RNA indicator sequences are not completely degraded by autoclaving

    PubMed Central

    Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.

    2014-01-01

    Short indicator RNA sequences (<100 bp) persist after autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856

  16. Concentrations of individual RNA sequences in polyadenylated nuclear and cytoplasmic RNA populations of Drosophila cells.

    PubMed Central

    Biessmann, H

    1980-01-01

    Steady state concentrations of individual RNA sequences in poly(A) nuclear and cytoplasmic RNA populations of Drosophila Kc cells were determined using cloned cDNA fragments. These cDNAs represent poly(A) RNA sequences of different abundance in the cytoplasm of Kc cells, but their steady state concentrations in poly(A) hnRNA was always lower. Of ten different sequences analysed, eight showed some four-fold lower concentration in hnRNA mRNA, two were underrepresented in hnRNA relative to the others. The obvious clustering of mRNA/hnRNA ratios is discussed in relation to sequence complexity and turnover rates of these RNA populations. Images PMID:6162158

  17. Depletion of Ribosomal RNA Sequences from Single-Cell RNA-Sequencing Library.

    PubMed

    Fang, Nan; Akinci-Tolun, Rumeysa

    2016-07-01

    Recent advances in single-cell RNA sequencing technologies have revealed high heterogeneity of gene expression profiles in individual cells. However, most current single-cell RNA-seq methods use oligo-dT priming in the reverse transcription steps and detect only polyA-positive for more accuracy, since there are also polyA-positive non-coding RNAs transcripts, not other important RNA species, such as polyA-negative noncoding RNA. Reverse transcription using random oligos enables detection of not only the noncoding RNA species without polyA tails, but also ribosomal RNA (rRNA). rRNA comprises more than 90% of the total RNA and should be depleted from the RNA-seq library to ensure efficient usage of the sequencing capacity. Commonly used hybridization-based rRNA depletion methods can preserve noncoding RNA in the standard RNA-seq library. However, such rRNA depletion methods require high input amounts of total RNA and do not work at the single-cell level or with limited input DNA. This unit describes a novel procedure for RNA-seq library construction from single cells or a minimal amount of RNA. A thermostable duplex-specific nuclease is used in this method to effectively remove ribosomal RNA sequences following whole-transcriptome amplification and sequencing library construction. © 2016 by John Wiley & Sons, Inc.

  18. Approaching marine bioprospecting in hexacorals by RNA deep sequencing.

    PubMed

    Johansen, Steinar D; Emblem, Ase; Karlsen, Bård Ove; Okkenhaug, Siri; Hansen, Hilde; Moum, Truls; Coucheron, Dag H; Seternes, Ole Morten

    2010-07-31

    RNA deep sequencing represents a new complementary approach in marine bioprospecting. Next-generation sequencing platforms have recently been developed for de novo whole transcriptome analysis, small RNA discovery and gene expression profiling. Deep sequencing transcriptomics (sequencing the complete set of cellular transcripts at a specific stage or condition) leads to sequential identification of all expressed genes in a sample. When combined to high-throughput bioinformatics and protein synthesis, RNA deep sequencing represents a new powerful approach in gene product discovery and bioprospecting. Here we summarize recent progress in the analyses of hexacoral transcriptomes with the focus on cold-water sea anemones and related organisms.

  19. Empirical insights into the stochasticity of small RNA sequencing

    NASA Astrophysics Data System (ADS)

    Qin, Li-Xuan; Tuschl, Thomas; Singer, Samuel

    2016-04-01

    The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.

  20. Empirical insights into the stochasticity of small RNA sequencing.

    PubMed

    Qin, Li-Xuan; Tuschl, Thomas; Singer, Samuel

    2016-04-07

    The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.

  1. BS-RNA: An efficient mapping and annotation tool for RNA bisulfite sequencing data.

    PubMed

    Liang, Fang; Hao, Lili; Wang, Jinyue; Shi, Shuo; Xiao, Jingfa; Li, Rujiao

    2016-12-01

    Cytosine methylation is one of the most important RNA epigenetic modifications. With the development of experimental technology, scientists attach more importance to RNA cytosine methylation and find bisulfite sequencing is an effective experimental method for RNA cytosine methylation study. However, there are only a few tools can directly deal with RNA bisulfite sequencing data efficiently. Herein, we developed a specialized tool BS-RNA, which can analyze cytosine methylation of RNA based on bisulfite sequencing data and support both paired-end and single-end sequencing reads from directional bisulfite libraries. For paired-end reads, simply removing the biased positions from the 5' end may result in "dovetailing" reads, where one or both reads seem to extend past the start of the mate read. BS-RNA could map "dovetailing" reads successfully. The annotation result of BS-RNA is exported in BED (.bed) format, including locations, sequence context types (CG/CHG/CHH, H=A,T, or C), reference sequencing depths, cytosine sequencing depths, and methylation levels of covered cytosine sites on both Watson and Crick strands. BS-RNA is an efficient, specialized and highly automated mapping and annotation tool for RNA bisulfite sequencing data. It performs better than the existing program in terms of accuracy and efficiency. BS-RNA is developed by Perl language and the source code of this tool is freely available from the website: http://bs-rna.big.ac.cn. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. RNAcentral: a comprehensive database of non-coding RNA sequences

    PubMed Central

    2017-01-01

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality. All RNAcentral data is provided for free and is available for browsing, bulk downloads, and programmatic access at http://rnacentral.org/. PMID:27794554

  3. miRBase: the microRNA sequence database.

    PubMed

    Griffiths-Jones, Sam

    2006-01-01

    The miRBase Sequence database is the primary repository for published microRNA (miRNA) sequence and annotation data. miRBase provides a user-friendly web interface for miRNA data, allowing the user to search using key words or sequences, trace links to the primary literature referencing the miRNA discoveries, analyze genomic coordinates and context, and mine relationships between miRNA sequences. miRBase also provides a confidential gene-naming service, assigning official miRNA names to novel genes before their publication. The methods outlined in this chapter describe these functions. miRBase is freely available to all at http://microrna.sanger.ac.uk/.

  4. RNAcentral: A comprehensive database of non-coding RNA sequences

    DOE PAGES

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similaritymore » searches as well as genome browsing functionality.« less

  5. RNAcentral: A comprehensive database of non-coding RNA sequences

    SciTech Connect

    Williams, Kelly Porter; Lau, Britney Yan

    2016-10-28

    RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. Furthermore, the website has been subject to continuous improvements focusing on text and sequence similarity searches as well as genome browsing functionality.

  6. Novel Approach to Analyzing MFE of Noncoding RNA Sequences

    PubMed Central

    George, Tina P.; Thomas, Tessamma

    2016-01-01

    Genomic studies have become noncoding RNA (ncRNA) centric after the study of different genomes provided enormous information on ncRNA over the past decades. The function of ncRNA is decided by its secondary structure, and across organisms, the secondary structure is more conserved than the sequence itself. In this study, the optimal secondary structure or the minimum free energy (MFE) structure of ncRNA was found based on the thermodynamic nearest neighbor model. MFE of over 2600 ncRNA sequences was analyzed in view of its signal properties. Mathematical models linking MFE to the signal properties were found for each of the four classes of ncRNA analyzed. MFE values computed with the proposed models were in concordance with those obtained with the standard web servers. A total of 95% of the sequences analyzed had deviation of MFE values within ±15% relative to those obtained from standard web servers. PMID:27695341

  7. Sequence coevolution between RNA and protein characterized by mutual information between residue triplets.

    PubMed

    Brandman, Relly; Brandman, Yigal; Pande, Vijay S

    2012-01-01

    Coevolving residues in a multiple sequence alignment provide evolutionary clues of biophysical interactions in 3D structure. Despite a rich literature describing amino acid coevolution within or between proteins and nucleic acid coevolution within RNA, to date there has been no direct evidence of coevolution between protein and RNA. The ribosome, a structurally conserved macromolecular machine composed of over 50 interacting protein and RNA chains, provides a natural example of RNA/protein interactions that likely coevolved. We provide the first direct evidence of RNA/protein coevolution by characterizing the mutual information in residue triplets from a multiple sequence alignment of ribosomal protein L22 and neighboring 23S RNA. We define residue triplets as three positions in the multiple sequence alignment, where one position is from the 23S RNA and two positions are from the L22 protein. We show that residue triplets with high mutual information are more likely than residue doublets to be proximal in 3D space. Some high mutual information residue triplets cluster in a connected series across the L22 protein structure, similar to patterns seen in protein coevolution. We also describe RNA nucleotides for which switching from one nucleotide to another (or between purines and pyrimidines) results in a change in amino acid distribution for proximal amino acid positions. Multiple crystal structures for evolutionarily distinct ribosome species can provide structural evidence for these differences. For one residue triplet, a pyrimidine in one species is a purine in another, and RNA/protein hydrogen bonds are present in one species but not the other. The results provide the first direct evidence of RNA/protein coevolution by using higher order mutual information, suggesting that biophysical constraints on interacting RNA and protein chains are indeed a driving force in their evolution.

  8. Sequence Coevolution between RNA and Protein Characterized by Mutual Information between Residue Triplets

    PubMed Central

    Brandman, Relly; Brandman, Yigal; Pande, Vijay S.

    2012-01-01

    Coevolving residues in a multiple sequence alignment provide evolutionary clues of biophysical interactions in 3D structure. Despite a rich literature describing amino acid coevolution within or between proteins and nucleic acid coevolution within RNA, to date there has been no direct evidence of coevolution between protein and RNA. The ribosome, a structurally conserved macromolecular machine composed of over 50 interacting protein and RNA chains, provides a natural example of RNA/protein interactions that likely coevolved. We provide the first direct evidence of RNA/protein coevolution by characterizing the mutual information in residue triplets from a multiple sequence alignment of ribosomal protein L22 and neighboring 23S RNA. We define residue triplets as three positions in the multiple sequence alignment, where one position is from the 23S RNA and two positions are from the L22 protein. We show that residue triplets with high mutual information are more likely than residue doublets to be proximal in 3D space. Some high mutual information residue triplets cluster in a connected series across the L22 protein structure, similar to patterns seen in protein coevolution. We also describe RNA nucleotides for which switching from one nucleotide to another (or between purines and pyrimidines) results in a change in amino acid distribution for proximal amino acid positions. Multiple crystal structures for evolutionarily distinct ribosome species can provide structural evidence for these differences. For one residue triplet, a pyrimidine in one species is a purine in another, and RNA/protein hydrogen bonds are present in one species but not the other. The results provide the first direct evidence of RNA/protein coevolution by using higher order mutual information, suggesting that biophysical constraints on interacting RNA and protein chains are indeed a driving force in their evolution. PMID:22279560

  9. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  10. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.

    PubMed

    Bauer, Markus; Klau, Gunnar W; Reinert, Knut

    2007-07-27

    The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.

  11. RNase P-Mediated Sequence-Specific Cleavage of RNA by Engineered External Guide Sequences.

    PubMed

    Derksen, Merel; Mertens, Vicky; Pruijn, Ger J M

    2015-11-09

    The RNA cleavage activity of RNase P can be employed to decrease the levels of specific RNAs and to study their function or even to eradicate pathogens. Two different technologies have been developed to use RNase P as a tool for RNA knockdown. In one of these, an external guide sequence, which mimics a tRNA precursor, a well-known natural RNase P substrate, is used to target an RNA molecule for cleavage by endogenous RNase P. Alternatively, a guide sequence can be attached to M1 RNA, the (catalytic) RNase P RNA subunit of Escherichia coli. The guide sequence is specific for an RNA target, which is subsequently cleaved by the bacterial M1 RNA moiety. These approaches are applicable in both bacteria and eukaryotes. In this review, we will discuss the two technologies in which RNase P is used to reduce RNA expression levels.

  12. RNase P-Mediated Sequence-Specific Cleavage of RNA by Engineered External Guide Sequences

    PubMed Central

    Derksen, Merel; Mertens, Vicky; Pruijn, Ger J.M.

    2015-01-01

    The RNA cleavage activity of RNase P can be employed to decrease the levels of specific RNAs and to study their function or even to eradicate pathogens. Two different technologies have been developed to use RNase P as a tool for RNA knockdown. In one of these, an external guide sequence, which mimics a tRNA precursor, a well-known natural RNase P substrate, is used to target an RNA molecule for cleavage by endogenous RNase P. Alternatively, a guide sequence can be attached to M1 RNA, the (catalytic) RNase P RNA subunit of Escherichia coli. The guide sequence is specific for an RNA target, which is subsequently cleaved by the bacterial M1 RNA moiety. These approaches are applicable in both bacteria and eukaryotes. In this review, we will discuss the two technologies in which RNase P is used to reduce RNA expression levels. PMID:26569326

  13. Unbiased Deep Sequencing of RNA Viruses from Clinical Samples

    PubMed Central

    Matranga, Christian B.; Gladden-Young, Adrianne; Qu, James; Winnicki, Sarah; Nosamiefan, Dolo; Levin, Joshua Z.; Sabeti, Pardis C.

    2016-01-01

    Here we outline a next-generation RNA sequencing protocol that enables de novo assemblies and intra-host variant calls of viral genomes collected from clinical and biological sources. The method is unbiased and universal; it uses random primers for cDNA synthesis and requires no prior knowledge of the viral sequence content. Before library construction, selective RNase H-based digestion is used to deplete unwanted RNA — including poly(rA) carrier and ribosomal RNA — from the viral RNA sample. Selective depletion improves both the data quality and the number of unique reads in viral RNA sequencing libraries. Moreover, a transposase-based 'tagmentation' step is used in the protocol as it reduces overall library construction time. The protocol has enabled rapid deep sequencing of over 600 Lassa and Ebola virus samples-including collections from both blood and tissue isolates-and is broadly applicable to other microbial genomics studies. PMID:27403729

  14. Noncoding RNA gene detection using comparative sequence analysis

    PubMed Central

    Rivas, Elena; Eddy, Sean R

    2001-01-01

    Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability. PMID:11801179

  15. Simulations Using Random-Generated DNA and RNA Sequences

    ERIC Educational Resources Information Center

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  16. Quantifying RNA allelic ratios by microfluidic multiplex PCR and sequencing.

    PubMed

    Zhang, Rui; Li, Xin; Ramaswami, Gokul; Smith, Kevin S; Turecki, Gustavo; Montgomery, Stephen B; Li, Jin Billy

    2014-01-01

    We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels and to accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq for studying allelic variations in the transcriptome.

  17. RNAcentral: an international database of ncRNA sequences

    DOE PAGES

    Williams, Kelly Porter

    2014-10-28

    The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.

  18. RNAcentral: an international database of ncRNA sequences

    SciTech Connect

    Williams, Kelly Porter

    2014-10-28

    The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.

  19. RNAcentral: an international database of ncRNA sequences

    PubMed Central

    2015-01-01

    The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species. PMID:25352543

  20. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  1. Compilation of 5S rRNA and 5S rRNA gene sequences

    PubMed Central

    Specht, Thomas; Wolters, Jörn; Erdmann, Volker A.

    1990-01-01

    The BERLIN RNA DATABANK as of Dezember 31, 1989, contains a total of 667 sequences of 5S rRNAs or their genes, which is an increase of 114 new sequence entries over the last compilation (1). It covers sequences from 44 archaebacteria, 267 eubacteria, 20 plastids, 6 mitochondria, 319 eukaryotes and 11 eukaryotic pseudogenes. The hardcopy shows only the list (Table 1) of those organisms whose sequences have been determined. The BERLIN RNA DATABANK uses the format of the EMBL Nucleotide Sequence Data Library complemented by a Sequence Alignment (SA) field including secondary structure information. PMID:1692116

  2. Translating RNA sequencing into clinical diagnostics: opportunities and challenges.

    PubMed

    Byron, Sara A; Van Keuren-Jensen, Kendall R; Engelthaler, David M; Carpten, John D; Craig, David W

    2016-05-01

    With the emergence of RNA sequencing (RNA-seq) technologies, RNA-based biomolecules hold expanded promise for their diagnostic, prognostic and therapeutic applicability in various diseases, including cancers and infectious diseases. Detection of gene fusions and differential expression of known disease-causing transcripts by RNA-seq represent some of the most immediate opportunities. However, it is the diversity of RNA species detected through RNA-seq that holds new promise for the multi-faceted clinical applicability of RNA-based measures, including the potential of extracellular RNAs as non-invasive diagnostic indicators of disease. Ongoing efforts towards the establishment of benchmark standards, assay optimization for clinical conditions and demonstration of assay reproducibility are required to expand the clinical utility of RNA-seq.

  3. A Detailed Protocol for Subcellular RNA Sequencing (subRNA-seq).

    PubMed

    Mayer, Andreas; Churchman, L Stirling

    2017-10-02

    In eukaryotic cells, RNAs at various maturation and processing levels are distributed across cellular compartments. The standard approach to determine transcript abundance and identity in vivo is RNA sequencing (RNA-seq). RNA-seq relies on RNA isolation from whole-cell lysates and thus mainly captures fully processed, stable, and more abundant cytoplasmic RNAs over nascent, unstable, and nuclear RNAs. Here, we provide a step-by-step protocol for subcellular RNA-seq (subRNA-seq). subRNA-seq allows the quantitative measurement of RNA polymerase II-generated RNAs from the chromatin, nucleoplasm, and cytoplasm of mammalian cells. This approach relies on cell fractionation prior to RNA isolation and sequencing library preparation. High-throughput sequencing of the subcellular RNAs can then be used to reveal the identity, abundance, and subcellular distribution of transcripts, thus providing insights into RNA processing and maturation. Deep sequencing of the chromatin-associated RNAs further offers the opportunity to study nascent RNAs. Subcellular RNA-seq libraries are obtained within 5 days. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley and Sons, Inc.

  4. Tuning RNA Flexibility with Helix Length and Junction Sequence.

    PubMed

    Sutton, Julie L; Pollack, Lois

    2015-12-15

    The increasing awareness of RNA's central role in biology calls for a new understanding of how RNAs, like proteins, recognize biological partners. Because RNA is inherently flexible, it assumes a variety of conformations. This conformational flexibility can be a critical aspect of how RNA attracts and binds molecular partners. Structurally, RNA consists of rigid basepaired duplexes, separated by flexible non-basepaired regions. Here, using an RNA system consisting of two short helices, connected by a single-stranded (non-basepaired) junction, we explore the role of helix length and junction sequence in determining the range of conformations available to a model RNA. Single-molecule Förster resonance energy transfer reports on the RNA conformation as a function of either mono- or divalent ion concentration. Electrostatic repulsion between helices dominates at low salt concentration, whereas junction sequence effects determine the conformations at high salt concentration. Near physiological salt concentrations, RNA conformation is sensitive to both helix length and junction sequence, suggesting a means for sensitively tuning RNA conformations. Copyright © 2015 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  5. FLDS: A Comprehensive dsRNA Sequencing Method for Intracellular RNA Virus Surveillance

    PubMed Central

    Urayama, Syun-ichi; Takaki, Yoshihiro; Nunoura, Takuro

    2016-01-01

    Knowledge of the distribution and diversity of RNA viruses is still limited in spite of their possible environmental and epidemiological impacts because RNA virus-specific metagenomic methods have not yet been developed. We herein constructed an effective metagenomic method for RNA viruses by targeting long double-stranded (ds)RNA in cellular organisms, which is a hallmark of infection, or the replication of dsRNA and single-stranded (ss)RNA viruses, except for retroviruses. This novel dsRNA targeting metagenomic method is characterized by an extremely high recovery rate of viral RNA sequences, the retrieval of terminal sequences, and uniform read coverage, which has not previously been reported in other metagenomic methods targeting RNA viruses. This method revealed a previously unidentified viral RNA diversity of more than 20 complete RNA viral genomes including dsRNA and ssRNA viruses associated with an environmental diatom colony. Our approach will be a powerful tool for cataloging RNA viruses associated with organisms of interest. PMID:26877136

  6. The chemical structure of DNA sequence signals for RNA transcription

    NASA Technical Reports Server (NTRS)

    George, D. G.; Dayhoff, M. O.

    1982-01-01

    The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.

  7. The chemical structure of DNA sequence signals for RNA transcription

    NASA Technical Reports Server (NTRS)

    George, D. G.; Dayhoff, M. O.

    1982-01-01

    The proposed recognition sites for RNA transcription for E. coli NRA polymerase, bacteriophage T7 RNA polymerase, and eukaryotic RNA polymerase Pol II are evaluated in the light of the requirements for efficient recognition. It is shown that although there is good experimental evidence that specific nucleic acid sequence patterns are involved in transcriptional regulation in bacteria and bacterial viruses, among the sequences now available, only in the case of the promoters recognized by bacteriophage T7 polymerase does it seem likely that the pattern is sufficient. It is concluded that the eukaryotic pattern that is investigated is not restrictive enough to serve as a recognition site.

  8. TARDIS, a targeted RNA directional sequencing method for rare RNA discovery.

    PubMed

    Portal, Maximiliano M; Pavet, Valeria; Erb, Cathie; Gronemeyer, Hinrich

    2015-12-01

    High-throughput transcriptional analysis has unveiled a myriad of novel RNAs. However, technical constraints in RNA sequencing library preparation and platform performance hamper the identification of rare transcripts contained within the RNA repertoire. Herein we present targeted-RNA directional sequencing (TARDIS), a hybridization-based method that allows subsets of RNAs contained within the transcriptome to be interrogated independently of transcript length, function, the presence or absence of poly-A tracts, or the mechanism of biogenesis. TARDIS is a modular protocol that is subdivided into four main phases, including the generation of random DNA traps covering the region of interest, purification of input RNA material, DNA trap-based RNA capture, and finally RNA-sequencing library construction. Importantly, coupling RNA capture to strand-specific RNA sequencing enables robust identification and reconstruction of novel transcripts, the definition of sense and antisense RNA pairs and, by the concomitant analysis of long and natural small RNA pools, it allows the user to infer potential precursor-product relations. TARDIS takes ∼10 d to implement.

  9. RNA sequencing analysis of the developing chicken retina

    PubMed Central

    Langouet-Astrie, Christophe J.; Meinsen, Annamarie L.; Grunwald, Emily R.; Turner, Stephen D.; Enke, Raymond A.

    2016-01-01

    RNA sequencing transcriptome analysis using massively parallel next generation sequencing technology provides the capability to understand global changes in gene expression throughout a range of tissue samples. Development of the vertebrate retina requires complex temporal orchestration of transcriptional activation and repression. The chicken embryo (Gallus gallus) is a classic model system for studying developmental biology and retinogenesis. Existing retinal transcriptome projects have been critical to the vision research community for studying aspects of murine and human retinogenesis, however, there are currently no publicly available data sets describing the developing chicken retinal transcriptome. Here we used Illumina RNA sequencing (RNA-seq) analysis to characterize the mRNA transcriptome of the developing chicken retina in an effort to identify genes critical for retinal development in this important model organism. These data will be valuable to the vision research community for characterizing global changes in gene expression between ocular tissues and critical developmental time points during retinogenesis in the chicken retina. PMID:27996968

  10. Tuning RNA Flexibility with Helix Length and Junction Sequence

    PubMed Central

    Sutton, Julie L.; Pollack, Lois

    2015-01-01

    The increasing awareness of RNA’s central role in biology calls for a new understanding of how RNAs, like proteins, recognize biological partners. Because RNA is inherently flexible, it assumes a variety of conformations. This conformational flexibility can be a critical aspect of how RNA attracts and binds molecular partners. Structurally, RNA consists of rigid basepaired duplexes, separated by flexible non-basepaired regions. Here, using an RNA system consisting of two short helices, connected by a single-stranded (non-basepaired) junction, we explore the role of helix length and junction sequence in determining the range of conformations available to a model RNA. Single-molecule Förster resonance energy transfer reports on the RNA conformation as a function of either mono- or divalent ion concentration. Electrostatic repulsion between helices dominates at low salt concentration, whereas junction sequence effects determine the conformations at high salt concentration. Near physiological salt concentrations, RNA conformation is sensitive to both helix length and junction sequence, suggesting a means for sensitively tuning RNA conformations. PMID:26682821

  11. Nuclear RNA sequencing of the mouse erythroid cell transcriptome.

    PubMed

    Mitchell, Jennifer A; Clay, Ieuan; Umlauf, David; Chen, Chih-Yu; Moir, Catherine A; Eskiw, Christopher H; Schoenfelder, Stefan; Chakalova, Lyubomira; Nagano, Takashi; Fraser, Peter

    2012-01-01

    In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq) in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq) of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A)-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

  12. Identifying novel sequence variants of RNA 3D motifs

    PubMed Central

    Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

    2015-01-01

    Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723

  13. IVT-seq reveals extreme bias in RNA sequencing

    PubMed Central

    2014-01-01

    Background RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results. PMID:24981968

  14. Library preparation for highly accurate population sequencing of RNA viruses

    PubMed Central

    Acevedo, Ashley; Andino, Raul

    2015-01-01

    Circular resequencing (CirSeq) is a novel technique for efficient and highly accurate next-generation sequencing (NGS) of RNA virus populations. The foundation of this approach is the circularization of fragmented viral RNAs, which are then redundantly encoded into tandem repeats by ‘rolling-circle’ reverse transcription. When sequenced, the redundant copies within each read are aligned to derive a consensus sequence of their initial RNA template. This process yields sequencing data with error rates far below the variant frequencies observed for RNA viruses, facilitating ultra-rare variant detection and accurate measurement of low-frequency variants. Although library preparation takes ~5 d, the high-quality data generated by CirSeq simplifies downstream data analysis, making this approach substantially more tractable for experimentalists. PMID:24967624

  15. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    PubMed Central

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  16. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications.

    PubMed

    Herzog, M; Maroteaux, L

    1986-11-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage.

  17. Comparison of ribosomal RNA removal methods for transcriptome sequencing workflows in teleost fish

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is becoming the standard for transcriptome analysis. Removal of contaminating ribosomal RNA (rRNA) is a priority in the preparation of libraries suitable for sequencing. rRNAs are commonly removed from total RNA via either mRNA selection or rRNA depletion. These methods have...

  18. Nucleotide sequence of papaya mosaic virus RNA.

    PubMed

    Sit, T L; Abouhaidar, M G; Holy, S

    1989-09-01

    The RNA genome of papaya mosaic virus is 6656 nucleotides long [excluding the poly(A) tail] with six open reading frames (ORFs) more than 200 nucleotides long. The four nearest the 5' end each overlap with adjacent ORFs and could code for proteins with Mr 176307, 26248, 11949 and 7224 (ORFs 1 to 4). The fifth ORF produces the capsid protein of Mr 23043 and the sixth ORF, located completely within ORF1, could code for a protein with Mr 14113. The translation products of ORFs 1 to 3 show strong similarity with those of other potexviruses but the ORF 4 protein has only limited similarity with the other potexvirus ORF 4 proteins of 7K to 11K.

  19. Discovering New Biology through Sequencing of RNA1

    PubMed Central

    Weber, Andreas P.M.

    2015-01-01

    Sequencing of RNA (RNA-Seq) was invented approximately 1 decade ago and has since revolutionized biological research. This update provides a brief historic perspective on the development of RNA-Seq and then focuses on the application of RNA-Seq in qualitative and quantitative analyses of transcriptomes. Particular emphasis is given to aspects of data analysis. Since the wet-lab and data analysis aspects of RNA-Seq are still rapidly evolving and novel applications are continuously reported, a printed review will be rapidly outdated and can only serve to provide some examples and general guidelines for planning and conducting RNA-Seq studies. Hence, selected references to frequently update online resources are given. PMID:26353759

  20. Cell growth inhibition by sequence-specific RNA minihelices.

    PubMed

    Hipps, D; Schimmel, P

    1995-08-15

    RNA minihelices which reconstruct the 12 base pair acceptor-T psi C domains of transfer RNAs interact with their cognate tRNA synthetases. These substrates lack the anticodons of the genetic code and, therefore, cannot participate in steps of protein synthesis subsequent to aminoacylation. We report here that expression in Escherichia coli of either of two minihelices, each specific for a different amino acid, inhibited cell growth. Inhibition appears to be due to direct competition between the minihelix and its related tRNA for binding to their common synthetase. This competition, in turn, sharply lowers the pool of the specific charged tRNA for protein synthesis. Inhibition is relieved by single nucleotide changes which disrupt the minihelix-synthetase interaction. The results suggest that sequence-specific RNA minihelix substrates bind to cognate synthetases in vivo and can, in principle, act as cell growth regulators. Naturally occurring non-tRNA substrates for aminoacylation may serve a similar purpose.

  1. Nucleotide sequences important for translation initiation of enterovirus RNA.

    PubMed Central

    Iizuka, N; Yonekawa, H; Nomoto, A

    1991-01-01

    An infectious cDNA clone was constructed from the genome of coxsackievirus B1 strain. A number of RNA transcripts that have mutations in the 5' noncoding region were synthesized in vitro from the modified cDNA clones and examined for their abilities to act as mRNAs in a cell-free translation system prepared from HeLa S3 cells. RNAs that lack nucleotide sequences at positions 568 to 726 and 565 to 726 were found to be less efficient and inactive mRNAs, respectively. To understand the biological significance of this region of RNA, small deletions and point mutations were introduced in the nucleotide sequence between positions 538 and 601. Except for a nucleotide substitution at 592 (U----C) within the 7-base conserved sequence, mutations introduced in the sequence downstream of position 568 did not affect much, if any, of the ability of RNA to act as mRNA. Except for a point mutation at 558 (C----U), mutations upstream of position 567 appeared to inactivate the mRNA. In the upstream region, a sequence consisting of 21 nucleotides at positions 546 to 566 is perfectly conserved in the 5' noncoding regions of enterovirus and rhinovirus genomes. These results suggest that the 7-base conserved sequence functions to maintain the efficiency of translation initiation and that the nucleotide sequence upstream of position 567, including the 21-base conserved sequence, plays essential roles in translation initiation. A deletion mutant whose genome lacks the nucleotide sequence at positions 568 to 726 showed a small-plaque phenotype and less virulence against suckling mice than the wild-type virus. Thus, reduction of the efficiency of translation initiation may result in the construction of enteroviruses with the lower-virulence phenotype. Images PMID:1651409

  2. Rattus norvegicus BN/SHR liver and heart left ventricle ribosomal RNA depleted directional RNA sequencing.

    PubMed

    Wyler, Emanuel; van Heesch, Sebastiaan; Adami, Eleonora; Hubner, Norbert; Landthaler, Markus

    2017-08-11

    The spontaneously hypertensive rat strain is a frequently used disease model. In a previous study, we measured translational efficiency from this strain and BN-Lx animals. Here, we describe long RNA sequencing reads from ribosomal RNA depleted samples from the same animals. This data can be used to investigate splicing-related events. RNA was extracted from rat liver and heart left ventricle from BN-Lx and SHR/Ola rats in biological replicates. Ribosomal RNA was removed and the samples subjected to directional high-throughput RNA-sequencing. Read and alignment statistics indicate high quality of the data. The raw sequencing reads are freely available on the NCBI short read archive and can be used for further research on tissue and strain differences, or analysed together with other published high-throughput data from the same animals.

  3. Local sequence and sequencing depth dependent accuracy of RNA-seq reads.

    PubMed

    Cai, Guoshuai; Liang, Shoudan; Zheng, Xiaofeng; Xiao, Feifei

    2017-08-09

    Many biases and spurious effects are inherent in RNA-seq technology, resulting in a non-uniform distribution of sequencing read counts for each base position in a gene. Therefore, a base-level strategy is required to model the non-uniformity. Also, the properties of sequencing read counts can be leveraged to achieve a more precise estimation of the mean and variance of measurement. In this study, we aimed to unveil the effects on RNA-seq accuracy from multiple factors and develop accurate modeling of RNA-seq reads in comparison. We found that the overdispersion rate decreased when sequencing depth increased on the base level. Moreover, the influence of local sequence(s) on the overdispersion rate was notable but no longer significant after adjusting the effect from sequencing depth. Based on these findings, we propose a desirable beta-binomial model with a dynamic overdispersion rate on the base-level proportion of sequencing read counts from two samples. The current study provides thorough insights into the impact of overdispersion at the position level and especially into its relationship with sequencing depth, local sequence, and preparation protocol. These properties of RNA-seq will aid in improvement of the quality control procedure and development of statistical methods for RNA-seq downstream analyses.

  4. RNAcentral: A vision for an international database of RNA sequences

    PubMed Central

    Bateman, Alex; Agrawal, Shipra; Birney, Ewan; Bruford, Elspeth A.; Bujnicki, Janusz M.; Cochrane, Guy; Cole, James R.; Dinger, Marcel E.; Enright, Anton J.; Gardner, Paul P.; Gautheret, Daniel; Griffiths-Jones, Sam; Harrow, Jen; Herrero, Javier; Holmes, Ian H.; Huang, Hsien-Da; Kelly, Krystyna A.; Kersey, Paul; Kozomara, Ana; Lowe, Todd M.; Marz, Manja; Moxon, Simon; Pruitt, Kim D.; Samuelsson, Tore; Stadler, Peter F.; Vilella, Albert J.; Vogel, Jan-Hinnerk; Williams, Kelly P.; Wright, Mathew W.; Zwieb, Christian

    2011-01-01

    During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor. PMID:21940779

  5. RNAcentral: A vision for an international database of RNA sequences.

    PubMed

    Bateman, Alex; Agrawal, Shipra; Birney, Ewan; Bruford, Elspeth A; Bujnicki, Janusz M; Cochrane, Guy; Cole, James R; Dinger, Marcel E; Enright, Anton J; Gardner, Paul P; Gautheret, Daniel; Griffiths-Jones, Sam; Harrow, Jen; Herrero, Javier; Holmes, Ian H; Huang, Hsien-Da; Kelly, Krystyna A; Kersey, Paul; Kozomara, Ana; Lowe, Todd M; Marz, Manja; Moxon, Simon; Pruitt, Kim D; Samuelsson, Tore; Stadler, Peter F; Vilella, Albert J; Vogel, Jan-Hinnerk; Williams, Kelly P; Wright, Mathew W; Zwieb, Christian

    2011-11-01

    During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.

  6. Integrated bioinformatics analysis of chromatin regulator EZH2 in regulating mRNA and lncRNA expression by ChIP sequencing and RNA sequencing

    PubMed Central

    Li, Yuan; Luo, Mei; Shi, Xuejiao; Lu, Zhiliang; Sun, Shouguo; Huang, Jianbing; Chen, Zhaoli; He, Jie

    2016-01-01

    Enhancer of zeste homolog 2 (EZH2), a dynamic chromatin regulator in cancer, represents a potential therapeutic target showing early signs of promise in clinical trials. EZH2 ChIP sequencing data in 19 cell lines and RNA sequencing data in ten cancer types were downloaded from GEO and TCGA, respectively. Integrated ChIP sequencing analysis and co-expressing analysis were conducted and both mRNA and long noncoding RNA (lncRNA) targets were detected. We detected a median of 4,672 mRNA targets and 4,024 lncRNA targets regulated by EZH2 in 19 cell lines. 20 mRNA targets and 27 lncRNA targets were found in all 19 cell lines. These mRNA targets were enriched in pathways in cancer, Hippo, Wnt, MAPK and PI3K-Akt pathways. Co-expression analysis confirmed numerous targets, mRNA genes (RRAS, TGFBR2, NUF2 and PRC1) and lncRNA genes (lncRNA LINC00261, DIO3OS, RP11-307C12.11 and RP11-98D18.9) were potential targets and were significantly correlated with EZH2. We predicted genome-wide potential targets and the role of EZH2 in regulating as a transcriptional suppressor or activator which could pave the way for mechanism studies and the targeted therapy of EZH2 in cancer. PMID:27835578

  7. Evaluation of commercially available RNA amplification kits for RNA sequencing using very low input amounts of total RNA.

    PubMed

    Shanker, Savita; Paulson, Ariel; Edenberg, Howard J; Peak, Allison; Perera, Anoja; Alekseyev, Yuriy O; Beckloff, Nicholas; Bivens, Nathan J; Donnelly, Robert; Gillaspy, Allison F; Grove, Deborah; Gu, Weikuan; Jafari, Nadereh; Kerley-Hamilton, Joanna S; Lyons, Robert H; Tepper, Clifford; Nicolet, Charles M

    2015-04-01

    This article includes supplemental data. Please visit http://www.fasebj.org to obtain this information.Multiple recent publications on RNA sequencing (RNA-seq) have demonstrated the power of next-generation sequencing technologies in whole-transcriptome analysis. Vendor-specific protocols used for RNA library construction often require at least 100 ng total RNA. However, under certain conditions, much less RNA is available for library construction. In these cases, effective transcriptome profiling requires amplification of subnanogram amounts of RNA. Several commercial RNA amplification kits are available for amplification prior to library construction for next-generation sequencing, but these kits have not been comprehensively field evaluated for accuracy and performance of RNA-seq for picogram amounts of RNA. To address this, 4 types of amplification kits were tested with 3 different concentrations, from 5 ng to 50 pg, of a commercially available RNA. Kits were tested at multiple sites to assess reproducibility and ease of use. The human total reference RNA used was spiked with a control pool of RNA molecules in order to further evaluate quantitative recovery of input material. Additional control data sets were generated from libraries constructed following polyA selection or ribosomal depletion using established kits and protocols. cDNA was collected from the different sites, and libraries were synthesized at a single site using established protocols. Sequencing runs were carried out on the Illumina platform. Numerous metrics were compared among the kits and dilutions used. Overall, no single kit appeared to meet all the challenges of small input material. However, it is encouraging that excellent data can be recovered with even the 50 pg input total RNA.

  8. Method for rapid base sequencing in DNA and RNA

    DOEpatents

    Jett, J.H.; Keller, R.A.; Martin, J.C.; Moyzis, R.K.; Ratliff, R.L.; Shera, E.B.; Stewart, C.C.

    1990-10-09

    A method is provided for the rapid base sequencing of DNA or RNA fragments wherein a single fragment of DNA or RNA is provided with identifiable bases and suspended in a moving flow stream. An exonuclease sequentially cleaves individual bases from the end of the suspended fragment. The moving flow stream maintains the cleaved bases in an orderly train for subsequent detection and identification. In a particular embodiment, individual bases forming the DNA or RNA fragments are individually tagged with a characteristic fluorescent dye. The train of bases is then excited to fluorescence with an output spectrum characteristic of the individual bases. Accordingly, the base sequence of the original DNA or RNA fragment can be reconstructed. 2 figs.

  9. Method for rapid base sequencing in DNA and RNA

    DOEpatents

    Jett, James H.; Keller, Richard A.; Martin, John C.; Moyzis, Robert K.; Ratliff, Robert L.; Shera, E. Brooks; Stewart, Carleton C.

    1990-01-01

    A method is provided for the rapid base sequencing of DNA or RNA fragments wherein a single fragment of DNA or RNA is provided with identifiable bases and suspended in a moving flow stream. An exonuclease sequentially cleaves individual bases from the end of the suspended fragment. The moving flow stream maintains the cleaved bases in an orderly train for subsequent detection and identification. In a particular embodiment, individual bases forming the DNA or RNA fragments are individually tagged with a characteristic fluorescent dye. The train of bases is then excited to fluorescence with an output spectrum characteristic of the individual bases. Accordingly, the base sequence of the original DNA or RNA fragment can be reconstructed.

  10. Method for rapid base sequencing in DNA and RNA

    DOEpatents

    Jett, J.H.; Keller, R.A.; Martin, J.C.; Moyzis, R.K.; Ratliff, R.L.; Shera, E.B.; Stewart, C.C.

    1987-10-07

    A method is provided for the rapid base sequencing of DNA or RNA fragments wherein a single fragment of DNA or RNA is provided with identifiable bases and suspended in a moving flow stream. An exonuclease sequentially cleaves individual bases from the end of the suspended fragment. The moving flow stream maintains the cleaved bases in an orderly train for subsequent detection and identification. In a particular embodiment, individual bases forming the DNA or RNA fragments are individually tagged with a characteristic fluorescent dye. The train of bases is then excited to fluorescence with an output spectrum characteristic of the individual bases. Accordingly, the base sequence of the original DNA or RNA fragment can be reconstructed. 2 figs.

  11. Complete nucleotide sequence of tobacco streak virus RNA 3.

    PubMed Central

    Cornelissen, B J; Janssen, H; Zuidema, D; Bol, J F

    1984-01-01

    Double-stranded cDNA of in vitro polyadenylated tobacco streak virus (TSV) RNA 3 has been cloned and sequenced. The complete primary structure of 2,205 nucleotides reveals two open reading frames flanked by a leader sequence of 210 bases, an intercistronic region of 123 nucleotides and a 3'-extracistronic sequence of 288 nucleotides. The 5'-terminal open reading frame codes for a Mr 31,742 protein, which probably corresponds to the only in vitro translation product of TSV RNA 3. The 3'-terminal coding region predicts a Mr 26,346 protein, probably the viral coat protein, which is the translation product of the subgenomic messenger, RNA 4. Although the coat proteins of alfalfa mosaic virus (A1MV) and TSV are functionally equivalent in activating their own and each others genomes, no homology between the primary structures of those two proteins is detectable. PMID:6546793

  12. Phylogenetic relationships of Cryptosporidium determined by ribosomal RNA sequence comparison.

    PubMed

    Johnson, A M; Fielke, R; Lumb, R; Baverstock, P R

    1990-04-01

    Reverse transcription of total cellular RNA was used to obtain a partial sequence of the small subunit ribosomal RNA of Cryptosporidium, a protist currently placed in the phylum Apicomplexa. The semi-conserved regions were aligned with homologous sequences in a range of other eukaryotes, and the evolutionary relationships of Cryptosporidium were determined by two different methods of phylogenetic analysis. The prokaryotes Escherichia coli and Halobacterium cuti were included as outgroups. The results do not show an especially close relationship of Cryptosporidium to other members of the phylum Apicomplexa.

  13. Normalizing single-cell RNA sequencing data: Challenges and opportunities

    PubMed Central

    Dudoit, Sandrine; Marioni, John C.

    2017-01-01

    Single-cell transcriptomics is becoming an important component of the molecular biologist’s toolkit. A critical step when analyzing this type of data is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, whose suitability for single-cell transcriptomics has not been assessed. In this perspective, we discuss commonly used normalization approaches and illustrate how these can lead to misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users. PMID:28504683

  14. Multiple structural alignment and clustering of RNA sequences.

    PubMed

    Torarinsson, Elfar; Havgaard, Jakob H; Gorodkin, Jan

    2007-04-15

    An apparent paradox in computational RNA structure prediction is that many methods, in advance, require a multiple alignment of a set of related sequences, when searching for a common structure between them. However, such a multiple alignment is hard to obtain even for few sequences with low sequence similarity without simultaneously folding and aligning them. Furthermore, it is of interest to conduct a multiple alignment of RNA sequence candidates found from searching as few as two genomic sequences. Here, based on the PMcomp program, we present a global multiple alignment program, foldalignM, which performs especially well on few sequences with low sequence similarity, and is comparable in performance with state of the art programs in general. In addition, it can cluster sequences based on sequence and structure similarity and output a multiple alignment for each cluster. Furthermore, preliminary results with local datasets indicate that the program is useful for post processing foldalign pairwise scans. The program foldalignM is implemented in JAVA and is, along with some accompanying PERL scripts, available at http://foldalign.ku.dk/

  15. Splatter: simulation of single-cell RNA sequencing data.

    PubMed

    Zappia, Luke; Phipson, Belinda; Oshlack, Alicia

    2017-09-12

    As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.

  16. Single-cell sequencing of the small-RNA transcriptome.

    PubMed

    Faridani, Omid R; Abdullayev, Ilgar; Hagemann-Jensen, Michael; Schell, John P; Lanner, Fredrik; Sandberg, Rickard

    2016-12-01

    Little is known about the heterogeneity of small-RNA expression as small-RNA profiling has so far required large numbers of cells. Here we present a single-cell method for small-RNA sequencing and apply it to naive and primed human embryonic stem cells and cancer cells. Analysis of microRNAs and fragments of tRNAs and small nucleolar RNAs (snoRNAs) reveals the potential of microRNAs as markers for different cell types and states.

  17. Probing dimensionality beyond the linear sequence of mRNA.

    PubMed

    Del Campo, Cristian; Ignatova, Zoya

    2016-05-01

    mRNA is a nexus entity between DNA and translating ribosomes. Recent developments in deep sequencing technologies coupled with structural probing have revealed new insights beyond the classic role of mRNA and place it more centrally as a direct effector of a variety of processes, including translation, cellular localization, and mRNA degradation. Here, we highlight emerging approaches to probe mRNA secondary structure on a global transcriptome-wide level and compare their potential and resolution. Combined approaches deliver a richer and more complex picture. While our understanding on the effect of secondary structure for various cellular processes is quite advanced, the next challenge is to unravel more complex mRNA architectures and tertiary interactions.

  18. Comparative Analysis of Single-Cell RNA Sequencing Methods.

    PubMed

    Ziegenhain, Christoph; Vieth, Beate; Parekh, Swati; Reinius, Björn; Guillaumet-Adkins, Amy; Smets, Martha; Leonhardt, Heinrich; Heyn, Holger; Hellmann, Ines; Enard, Wolfgang

    2017-02-16

    Single-cell RNA sequencing (scRNA-seq) offers new possibilities to address biological and medical questions. However, systematic comparisons of the performance of diverse scRNA-seq protocols are lacking. We generated data from 583 mouse embryonic stem cells to evaluate six prominent scRNA-seq methods: CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2. While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs). Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for transcriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells. Our quantitative comparison offers the basis for an informed choice among six prominent scRNA-seq methods, and it provides a framework for benchmarking further improvements of scRNA-seq protocols.

  19. Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

    PubMed

    Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

    2017-10-06

    Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.

  20. MicroRNA target prediction using thermodynamic and sequence curves.

    PubMed

    Ghoshal, Asish; Shankar, Raghavendran; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali

    2015-11-25

    MicroRNAs (miRNAs) are small regulatory RNA that mediate RNA interference by binding to various mRNA target regions. There have been several computational methods for the identification of target mRNAs for miRNAs. However, these have considered all contributory features as scalar representations, primarily, as thermodynamic or sequence-based features. Further, a majority of these methods solely target canonical sites, which are sites with "seed" complementarity. Here, we present a machine-learning classification scheme, titled Avishkar, which captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves, separately for various input features, such as thermodynamic and sequence features. Further, we use a principled approach to uniformly model canonical and non-canonical seed matches, using a novel seed enrichment metric. We demonstrate that large number of seed-match patterns have high enrichment values, conserved across species, and that majority of miRNA binding sites involve non-canonical matches, corroborating recent findings. Using spatial curves and popular categorical features, such as target site length and location, we train a linear SVM model, utilizing experimental CLIP-seq data. Our model significantly outperforms all established methods, for both canonical and non-canonical sites. We achieve this while using a much larger candidate miRNA-mRNA interaction set than prior work. We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves, specifically about 20% better than the state-of-the-art, for different species (human or mouse), or different target types (canonical or non-canonical). To the best of our knowledge we provide the first distributed framework for microRNA target prediction based on Apache Hadoop and Spark. All source code and data is publicly available at https://bitbucket.org/cellsandmachines/avishkar.

  1. Replication and packaging of Turnip yellow mosaic virus RNA containing Flock house virus RNA1 sequence.

    PubMed

    Kim, Hui-Bae; Kim, Do-Yeong; Cho, Tae-Ju

    2014-06-01

    Turnip yellow mosaic virus (TYMV) is a spherical plant virus that has a single 6.3 kb positive strand RNA as a genome. In this study, RNA1 sequence of Flock house virus (FHV) was inserted into the TYMV genome to test whether TYMV can accommodate and express another viral entity. In the resulting construct, designated TY-FHV, the FHV RNA1 sequence was expressed as a TYMV subgenomic RNA. Northern analysis of the Nicotiana benthamiana leaves agroinfiltrated with the TY-FHV showed that both genomic and subgenomic FHV RNAs were abundantly produced. This indicates that the FHV RNA1 sequence was correctly expressed and translated to produce a functional FHV replicase. Although these FHV RNAs were not encapsidated, the FHV RNA having a TYMV CP sequence at the 3'-end was efficiently encapsidated. When an eGFP gene was inserted into the B2 ORF of the FHV sequence, a fusion protein of B2-eGFP was produced as expected.

  2. Coding and 3' non-coding nucleotide sequence of chalcone synthase mRNA and assignment of amino acid sequence of the enzyme

    PubMed Central

    Reimold, Ursula; Kröger, Manfred; Kreuzaler, Fritz; Hahlbrock, Klaus

    1983-01-01

    The nucleotide sequence of an almost complete cDNA copy of chalcone synthase mRNA from cultured parsley cells (Petroselinum hortense) has been determined. The cDNA copy comprised the complete coding sequence for chalcone synthase, a short A-rich stretch of the 5' non-coding region and the complete 3' non-coding region including a poly(A) tail. The amino acid sequence deduced from the nucleotide sequence of the cDNA is consistent with a partial N-terminal sequence analysis, the total amino acid composition, the cyanogen bromide cleavage pattern, and the apparent mol. wt. of the subunit of the purified enzyme. PMID:16453477

  3. KnotSeeker: heuristic pseudoknot detection in long RNA sequences.

    PubMed

    Sperschneider, Jana; Datta, Amitava

    2008-04-01

    Pseudoknots are folded structures in RNA molecules that perform essential functions as part of cellular transcription machinery and regulatory processes. The prediction of these structures in RNA molecules has important implications in antiviral drug design. It has been shown that the prediction of pseudoknots is an NP-complete problem. Practical structure prediction algorithms based on free energy minimization employ a restricted problem class and dynamic programming. However, these algorithms are computationally very expensive, and their accuracy deteriorates if the input sequence containing the pseudoknot is too long. Heuristic methods can be more efficient, but do not guarantee an optimal solution in regards to the minimum free energy model. We present KnotSeeker, a new heuristic algorithm for the detection of pseudoknots in RNA sequences as a preliminary step for structure prediction. Our method uses a hybrid sequence matching and free energy minimization approach to perform a screening of the primary sequence. We select short sequence fragments as possible candidates that may contain pseudoknots and verify them by using an existing dynamic programming algorithm and a minimum weight independent set calculation. KnotSeeker is significantly more accurate in detecting pseudoknots compared to other common methods as reported in the literature. It is very efficient and therefore a practical tool, especially for long sequences. The algorithm has been implemented in Python and it also uses C/C++ code from several other known techniques. The code is available from http://www.csse.uwa.edu.au/~datta/pseudoknot.

  4. Studying RNA homology and conservation with Infernal: from single sequences to RNA families

    PubMed Central

    Barquist, Lars; Burge, Sarah W.; Gardner, Paul P.

    2016-01-01

    Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remains difficult. This protocol introduces methods developed by the Rfam database for identifying “families” of homologous ncRNAs starting from single “seed” sequences using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs, then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. PMID:27322404

  5. Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families.

    PubMed

    Barquist, Lars; Burge, Sarah W; Gardner, Paul P

    2016-06-20

    Emerging high-throughput technologies have led to a deluge of putative non-coding RNA (ncRNA) sequences identified in a wide variety of organisms. Systematic characterization of these transcripts will be a tremendous challenge. Homology detection is critical to making maximal use of functional information gathered about ncRNAs: identifying homologous sequence allows us to transfer information gathered in one organism to another quickly and with a high degree of confidence. ncRNA presents a challenge for homology detection, as the primary sequence is often poorly conserved and de novo secondary structure prediction and search remain difficult. This unit introduces methods developed by the Rfam database for identifying "families" of homologous ncRNAs starting from single "seed" sequences, using manually curated sequence alignments to build powerful statistical models of sequence and structure conservation known as covariance models (CMs), implemented in the Infernal software package. We provide a step-by-step iterative protocol for identifying ncRNA homologs and then constructing an alignment and corresponding CM. We also work through an example for the bacterial small RNA MicA, discovering a previously unreported family of divergent MicA homologs in genus Xenorhabdus in the process. © 2016 by John Wiley & Sons, Inc.

  6. Specific alignment of structured RNA: stochastic grammars and sequence annealing.

    PubMed

    Bradley, Robert K; Pachter, Lior; Holmes, Ian

    2008-12-01

    Whole-genome screens suggest that eukaryotic genomes are dense with non-coding RNAs (ncRNAs). We introduce a novel approach to RNA multiple alignment which couples a generative probabilistic model of sequence and structure with an efficient sequence annealing approach for exploring the space of multiple alignments. This leads to a new software program, Stemloc-AMA, that is both accurate and specific in the alignment of multiple related RNA sequences. When tested on the benchmark datasets BRalibase II and BRalibase 2.1, Stemloc-AMA has comparable sensitivity to and better specificity than the best competing methods. We use a large-scale random sequence experiment to show that while most alignment programs maximize sensitivity at the expense of specificity, even to the point of giving complete alignments of non-homologous sequences, Stemloc-AMA aligns only sequences with detectable homology and leaves unrelated sequences largely unaligned. Such accurate and specific alignments are crucial for comparative-genomics analysis, from inferring phylogeny to estimating substitution rates across different lineages. Stemloc-AMA is available from http://biowiki.org/StemLocAMA as part of the dart software package for sequence analysis.

  7. RNA-DNA sequence differences spell genetic code ambiguities

    PubMed Central

    Nielsen, Michael L.

    2011-01-01

    A recent paper in Science by Li et al. 20111 reports widespread sequence differences in the human transcriptome between RNAs and their encoding genes termed RNA-DNA differences (RDDs). The findings could add a new layer of complexity to gene expression but the study has been criticized.  PMID:22567189

  8. Sequence and structural conservation in RNA ribose zippers

    SciTech Connect

    Tamura, Makio; Holbrook, Stephen R.

    2002-03-01

    The ribose zipper, an important element of RNA tertiary structure, is characterized by consecutive hydrogen-bonding interactions between ribose 20-hydroxyls from different regions of an RNA chain or between RNA chains. These tertiary contacts have previously been observed to also involve base backbone and base base interactions (A-minor type). We searched for ribose zipper tertiary interactions in the crystal structures of the large ribosomal subunit RNAs of Haloarcula marismortui and Deinococcus radiodurans, and the small ribosomal subunit RNA of Thermus thermophilus and identified a total of 97 ribose zippers. Of these, 20 were found in T. thermophilus 16 S rRNA, 44 in H. marismortui 23 S rRNA (plus 2 bridging 5 S and 23 S rRNAs) and 30 in D. radiodurans 23 S rRNA (plus 1 bridging 5 S and 23 S rRNAs). These were analyzed in terms of sequence conservation, structural conservation and stability, location in secondary structure, and phylogenetic conservation. Eleven types of ribose zippers were defined based on ribose base interactions. Of these 11, seven were observed in the ribosomal RNAs. The most common of these is the canonical ribose zipper, originally observed in the P4 P6 group I intron fragment. All ribose zippers were formed by antiparallel chain interactions and only a single example extended beyond two residues, forming an overlapping ribose zipper of three consecutive residues near the small subunit A-site. Almost all ribose zippers link stem (Watson Crick duplex) or stem-like (base-paired), with loop (external, internal, or junction) chain segments. About two-thirds of the observed ribose zippers interact with ribosomal proteins. Most of these ribosomal proteins bridge the ribose zipper chain segments with basic amino acid residues hydrogen bonding to the RNA backbone. Proteins involved in crucial ribosome function and in early stages of ribosomal assembly also stabilize ribose zipper interactions. All ribose zippers show strong sequence conservation

  9. High Throughput Sequencing of Extracellular RNA from Human Plasma

    PubMed Central

    Danielson, Kirsty M.; Rubio, Renee; Abderazzaq, Fieda; Das, Saumya; Wang, Yaoyu E.

    2017-01-01

    The presence and relative stability of extracellular RNAs (exRNAs) in biofluids has led to an emerging recognition of their promise as ‘liquid biopsies’ for diseases. Most prior studies on discovery of exRNAs as disease-specific biomarkers have focused on microRNAs (miRNAs) using technologies such as qRT-PCR and microarrays. The recent application of next-generation sequencing to discovery of exRNA biomarkers has revealed the presence of potential novel miRNAs as well as other RNA species such as tRNAs, snoRNAs, piRNAs and lncRNAs in biofluids. At the same time, the use of RNA sequencing for biofluids poses unique challenges, including low amounts of input RNAs, the presence of exRNAs in different compartments with varying degrees of vulnerability to isolation techniques, and the high abundance of specific RNA species (thereby limiting the sensitivity of detection of less abundant species). Moreover, discovery in human diseases often relies on archival biospecimens of varying age and limiting amounts of samples. In this study, we have tested RNA isolation methods to optimize profiling exRNAs by RNA sequencing in individuals without any known diseases. Our findings are consistent with other recent studies that detect microRNAs and ribosomal RNAs as the major exRNA species in plasma. Similar to other recent studies, we found that the landscape of biofluid microRNA transcriptome is dominated by several abundant microRNAs that appear to comprise conserved extracellular miRNAs. There is reasonable correlation of sets of conserved miRNAs across biological replicates, and even across other data sets obtained at different investigative sites. Conversely, the detection of less abundant miRNAs is far more dependent on the exact methodology of RNA isolation and profiling. This study highlights the challenges in detecting and quantifying less abundant plasma miRNAs in health and disease using RNA sequencing platforms. PMID:28060806

  10. Single Molecule Electrical Sequencing of DNA and RNA

    NASA Astrophysics Data System (ADS)

    Taniguchi, Masateru

    2013-03-01

    Gating nanopore devices are composed of nanopores with embedded nanoelectrodes, and they are expected to be one of the core devices used to realize label-free, low-cost DNA sequencing, subsequently leading to 1000-genome sequencing technologies. The operating principle of these nanodevices is based on identifying single base molecules of single DNA passing through a nanopore using a tunneling current between nanoelectrodes. We successfully identified single base molecules of DNA and RNA using tunneling currents. To make gating nanopore devices fit for practical use, core technologies should be integrated on one device chip. One core technology is the identification of single DNA and RNA composed of many base molecules using tunneling currents. We have succeeded in the single-molecule electrical sequencing of DNA and RNA formed by 3 and 7 base molecules, respectively, using a hybrid method of identifying single base molecules via a tunnelling current and random sequencing. A method that controls the speed of a single DNA passing through a nanopore is one core technology that determines the speed and accuracy of sequencing. We successfully developed a method that controls the translocation speed of a single DNA by three orders of magnitude using a voltage between nanoelectrodes.

  11. Learning to Predict miRNA-mRNA Interactions from AGO CLIP Sequencing and CLASH Data

    PubMed Central

    Lu, Yuheng; Leslie, Christina S.

    2016-01-01

    Recent technologies like AGO CLIP sequencing and CLASH enable direct transcriptome-wide identification of AGO binding and miRNA target sites, but the most widely used miRNA target prediction algorithms do not exploit these data. Here we use discriminative learning on AGO CLIP and CLASH interactions to train a novel miRNA target prediction model. Our method combines two SVM classifiers, one to predict miRNA-mRNA duplexes and a second to learn a binding model of AGO’s local UTR sequence preferences and positional bias in 3’UTR isoforms. The duplex SVM model enables the prediction of non-canonical target sites and more accurately resolves miRNA interactions from AGO CLIP data than previous methods. The binding model is trained using a multi-task strategy to learn context-specific and common AGO sequence preferences. The duplex and common AGO binding models together outperform existing miRNA target prediction algorithms on held-out binding data. Open source code is available at https://bitbucket.org/leslielab/chimiric. PMID:27438777

  12. rnaSeqMap: a Bioconductor package for RNA sequencing data exploration

    PubMed Central

    2011-01-01

    Background The throughput of commercially available sequencers has recently significantly increased. It has reached the point where measuring the RNA expression by the depth of coverage has become feasible even for largest genomes. The development of software tools is constantly following the progress of biological hardware. In particular, as RNA sequencing software can be regarded genome browsers, exon junction tools and statistical tools operating on counts of reads in predefined regions. The library rnaSeqMap, freely available via Bioconductor, is an RNA sequencing software which is independent of any biological hardware platform. It is based upon standard Bioconductor infrastructure for sequencing data and includes several novel features focused on deeper understanding of coverage expression profiles and discovery of novel transcription regions. Results rnaSeqMap is a toolbox for analyses that may be performed with the use of gene annotations or alternatively, in an unsupervised mode, on any genomic region to find novel or non-standard transcripts. The data back-end may be a MySQL database or a set of files in standard BAM format. The processing in R can be run on a machine without any particular hardware requirements, and scales linearly with the number of genomic loci and number of samples analyzed. The main features of rnaSeqMap include coverage operations, discovering irreducible regions of high expression, significance search and splicing analyses with nucleotide granularity. Conclusions This software may be used for a range of applications related to RNA sequencing by building customized analysis pipelines. The applicability and precision is expected to increase in parallel with the progress of the genome coverage in sequencers. PMID:21612622

  13. Prediction and prioritization of neoantigens: integration of RNA sequencing data with whole-exome sequencing.

    PubMed

    Karasaki, Takahiro; Nagayama, Kazuhiro; Kuwano, Hideki; Nitadori, Jun-Ichi; Sato, Masaaki; Anraku, Masaki; Hosoi, Akihiro; Matsushita, Hirokazu; Takazawa, Masaki; Ohara, Osamu; Nakajima, Jun; Kakimi, Kazuhiro

    2017-02-01

    The importance of neoantigens for cancer immunity is now well-acknowledged. However, there are diverse strategies for predicting and prioritizing candidate neoantigens, and thus reported neoantigen loads vary a great deal. To clarify this issue, we compared the numbers of neoantigen candidates predicted by four currently utilized strategies. Whole-exome sequencing and RNA sequencing (RNA-Seq) of four non-small-cell lung cancer patients was carried out. We identified 361 somatic missense mutations from which 224 candidate neoantigens were predicted using MHC class I binding affinity prediction software (strategy I). Of these, 207 exceeded the set threshold of gene expression (fragments per kilobase of transcript per million fragments mapped ≥1), resulting in 124 candidate neoantigens (strategy II). To verify mutant mRNA expression, sequencing of amplicons from tumor cDNA including each mutation was undertaken; 204 of the 207 mutations were successfully sequenced, yielding 121 mutant mRNA sequences, resulting in 75 candidate neoantigens (strategy III). Sequence information was extracted from RNA-Seq to confirm the presence of mutated mRNA. Variant allele frequencies ≥0.04 in RNA-Seq were found for 117 of the 207 mutations and regarded as expressed in the tumor, and finally, 72 candidate neoantigens were predicted (strategy IV). Without additional amplicon sequencing of cDNA, strategy IV was comparable to strategy III. We therefore propose strategy IV as a practical and appropriate strategy to predict candidate neoantigens fully utilizing currently available information. It is of note that different neoantigen loads were deduced from the same tumors depending on the strategies applied.

  14. Exploring Connectivity in Sequence Space of Functional RNA

    NASA Technical Reports Server (NTRS)

    Wei, Chenyu; Pohorille, Andrzej; Popovic, Milena; Ditzler, Mark

    2017-01-01

    Emergence of replicable genetic molecules was one of the marking points in the origin of life, evolution of which can be conceptualized as a walk through the space of all possible sequences. A theoretical concept of fitness landscape helps to understand evolutionary processes through assigning a value of fitness to each genotype. Then, evolution of a phenotype is viewed as a series of consecutive, single-point mutations. Natural selection biases evolution toward peaks of high fitness and away from valleys of low fitness. whereas neutral drift occurs in the sequence space without direction as mutations are introduced at random. Large networks of neutral or near-neutral mutations on a fitness landscape, especially for sufficiently long genomes, are possible or even inevitable. Their detection in experiments, however, has been elusive. Although a few near-neutral evolutionary pathways have been found, recent experimental evidence indicates landscapes consist of largely isolated islands. The generality of these results, however, is not clear, as the genome length or the fraction of functional molecules in the genotypic space might have been insufficient for the emergence of large, neutral networks. Thorough investigation on the structure of the fitness landscape is essential to understand the mechanisms of evolution of early genomes. RNA molecules are commonly assumed to play the pivotal role in the origin of genetic systems. They are widely believed to be early, if not the earliest, genetic and catalytic molecules, with abundant biochemical activities as aptamers and ribozymes, i.e. RNA molecules capable, respectively, to bind small molecules or catalyze chemical reactions. Here, we present results of our recent studies on the structure of the sequence space of RNA ligase ribozymes selected through in vitro evolution. Several hundred thousands of sequences active to a different degree were obtained by way of deep sequencing. Analysis of these sequences revealed

  15. siRNA release from pri-miRNA scaffolds is controlled by the sequence and structure of RNA.

    PubMed

    Galka-Marciniak, Paulina; Olejniczak, Marta; Starega-Roslan, Julia; Szczesniak, Michal W; Makalowska, Izabela; Krzyzosiak, Wlodzimierz J

    2016-04-01

    shmiRs are pri-miRNA-based RNA interference triggers from which exogenous siRNAs are expressed in cells to silence target genes. These reagents are very promising tools in RNAi in vivo applications due to their good activity profile and lower toxicity than observed for other vector-based reagents such as shRNAs. In this study, using high-resolution northern blotting and small RNA sequencing, we investigated the precision with which RNases Drosha and Dicer process shmiRs. The fidelity of siRNA release from the commonly used pri-miRNA shuttles was found to depend on both the siRNA insert and the pri-miR scaffold. Then, we searched for specific factors that may affect the precision of siRNA release and found that both the structural features of shmiR hairpins and the nucleotide sequence at Drosha and Dicer processing sites contribute to cleavage site selection and cleavage precision. An analysis of multiple shRNA intermediates generated from several reagents revealed the complexity of shmiR processing by Drosha and demonstrated that Dicer selects substrates for further processing. Aside from providing new basic knowledge regarding the specificity of nucleases involved in miRNA biogenesis, our results facilitate the rational design of more efficient genetic reagents for RNAi technology. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data.

    PubMed

    An, Jiyuan; Lai, John; Lehman, Melanie L; Nelson, Colleen C

    2013-01-01

    miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star.

  17. Quantitative assessment of RNA-protein interactions with high-throughput sequencing-RNA affinity profiling.

    PubMed

    Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T

    2015-08-01

    Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.

  18. Maize Gene Atlas Developed by RNA Sequencing and Comparative Evaluation of Transcriptomes Based on RNA Sequencing and Microarrays

    PubMed Central

    Sekhon, Rajandeep S.; Briskine, Roman; Hirsch, Candice N.; Myers, Chad L.; Springer, Nathan M.; Buell, C. Robin; de Leon, Natalia; Kaeppler, Shawn M.

    2013-01-01

    Transcriptome analysis is a valuable tool for identification and characterization of genes and pathways underlying plant growth and development. We previously published a microarray-based maize gene atlas from the analysis of 60 unique spatially and temporally separated tissues from 11 maize organs [1]. To enhance the coverage and resolution of the maize gene atlas, we have analyzed 18 selected tissues representing five organs using RNA sequencing (RNA-Seq). For a direct comparison of the two methodologies, the same RNA samples originally used for our microarray-based atlas were evaluated using RNA-Seq. Both technologies produced similar transcriptome profiles as evident from high Pearson's correlation statistics ranging from 0.70 to 0.83, and from nearly identical clustering of the tissues. RNA-Seq provided enhanced coverage of the transcriptome, with 82.1% of the filtered maize genes detected as expressed in at least one tissue by RNA-Seq compared to only 56.5% detected by microarrays. Further, from the set of 465 maize genes that have been historically well characterized by mutant analysis, 427 show significant expression in at least one tissue by RNA-Seq compared to 390 by microarray analysis. RNA-Seq provided higher resolution for identifying tissue-specific expression as well as for distinguishing the expression profiles of closely related paralogs as compared to microarray-derived profiles. Co-expression analysis derived from the microarray and RNA-Seq data revealed that broadly similar networks result from both platforms, and that co-expression estimates are stable even when constructed from mixed data including both RNA-Seq and microarray expression data. The RNA-Seq information provides a useful complement to the microarray-based maize gene atlas and helps to further understand the dynamics of transcription during maize development. PMID:23637782

  19. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  20. Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis

    PubMed Central

    Mizuno, Takako; Sridharan, Anusha; Du, Yina; Guo, Minzhe; Wikenheiser-Brokamp, Kathryn A.; Perl, Anne-Karina T.; Funari, Vincent A.; Gokey, Jason J.; Stripp, Barry R.; Whitsett, Jeffrey A.

    2016-01-01

    Idiopathic pulmonary fibrosis (IPF) is a lethal interstitial lung disease characterized by airway remodeling, inflammation, alveolar destruction, and fibrosis. We utilized single-cell RNA sequencing (scRNA-seq) to identify epithelial cell types and associated biological processes involved in the pathogenesis of IPF. Transcriptomic analysis of normal human lung epithelial cells defined gene expression patterns associated with highly differentiated alveolar type 2 (AT2) cells, indicated by enrichment of RNAs critical for surfactant homeostasis. In contrast, scRNA-seq of IPF cells identified 3 distinct subsets of epithelial cell types with characteristics of conducting airway basal and goblet cells and an additional atypical transitional cell that contributes to pathological processes in IPF. Individual IPF cells frequently coexpressed alveolar type 1 (AT1), AT2, and conducting airway selective markers, demonstrating “indeterminate” states of differentiation not seen in normal lung development. Pathway analysis predicted aberrant activation of canonical signaling via TGF-β, HIPPO/YAP, P53, WNT, and AKT/PI3K. Immunofluorescence confocal microscopy identified the disruption of alveolar structure and loss of the normal proximal-peripheral differentiation of pulmonary epithelial cells. scRNA-seq analyses identified loss of normal epithelial cell identities and unique contributions of epithelial cells to the pathogenesis of IPF. The present study provides a rich data source to further explore lung health and disease. PMID:27942595

  1. Simultaneous rapid sequencing of multiple RNA virus genomes.

    PubMed

    Neill, John D; Bayles, Darrell O; Ridpath, Julia F

    2014-06-01

    Comparing sequences of archived viruses collected over many years to the present allows the study of viral evolution and contributes to the design of new vaccines. However, the difficulty, time and expense of generating full-length sequences individually from each archived sample have hampered these studies. Next generation sequencing technologies have been utilized for analysis of clinical and environmental samples to identify viral pathogens that may be present. This has led to the discovery of many new, uncharacterized viruses from a number of viral families. Use of these sequencing technologies would be advantageous in examining viral evolution. In this study, a sequencing procedure was used to sequence simultaneously and rapidly multiple archived samples using a single standard protocol. This procedure utilized primers composed of 20 bases of known sequence with 8 random bases at the 3'-end that also served as an identifying barcode that allowed the differentiation each viral library following pooling and sequencing. This conferred sequence independence by random priming both first and second strand cDNA synthesis. Viral stocks were treated with a nuclease cocktail to reduce the presence of host nucleic acids. Viral RNA was extracted, followed by single tube random-primed double-stranded cDNA synthesis. The resultant cDNAs were amplified by primer-specific PCR, pooled, size fractionated and sequenced on the Ion Torrent PGM platform. The individual virus genomes were readily assembled by both de novo and template-assisted assembly methods. This procedure consistently resulted in near full length, if not full-length, genomic sequences and was used to sequence multiple bovine pestivirus and coronavirus isolates simultaneously.

  2. Using Small RNA Deep Sequencing Data to Detect Human Viruses.

    PubMed

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F; Fei, ZhangJun; Zhu, Xiao; Gao, Shan

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans.

  3. Using Small RNA Deep Sequencing Data to Detect Human Viruses

    PubMed Central

    Wang, Fang; Sun, Yu; Ruan, Jishou; Chen, Rui; Chen, Xin; Chen, Chengjie; Kreuze, Jan F.; Fei, ZhangJun; Zhu, Xiao

    2016-01-01

    Small RNA sequencing (sRNA-seq) can be used to detect viruses in infected hosts without the necessity to have any prior knowledge or specialized sample preparation. The sRNA-seq method was initially used for viral detection and identification in plants and then in invertebrates and fungi. However, it is still controversial to use sRNA-seq in the detection of mammalian or human viruses. In this study, we used 931 sRNA-seq runs of data from the NCBI SRA database to detect and identify viruses in human cells or tissues, particularly from some clinical samples. Six viruses including HPV-18, HBV, HCV, HIV-1, SMRV, and EBV were detected from 36 runs of data. Four viruses were consistent with the annotations from the previous studies. HIV-1 was found in clinical samples without the HIV-positive reports, and SMRV was found in Diffuse Large B-Cell Lymphoma cells for the first time. In conclusion, these results suggest the sRNA-seq can be used to detect viruses in mammals and humans. PMID:27066498

  4. Cell growth inhibition by sequence-specific RNA minihelices.

    PubMed Central

    Hipps, D; Schimmel, P

    1995-01-01

    RNA minihelices which reconstruct the 12 base pair acceptor-T psi C domains of transfer RNAs interact with their cognate tRNA synthetases. These substrates lack the anticodons of the genetic code and, therefore, cannot participate in steps of protein synthesis subsequent to aminoacylation. We report here that expression in Escherichia coli of either of two minihelices, each specific for a different amino acid, inhibited cell growth. Inhibition appears to be due to direct competition between the minihelix and its related tRNA for binding to their common synthetase. This competition, in turn, sharply lowers the pool of the specific charged tRNA for protein synthesis. Inhibition is relieved by single nucleotide changes which disrupt the minihelix-synthetase interaction. The results suggest that sequence-specific RNA minihelix substrates bind to cognate synthetases in vivo and can, in principle, act as cell growth regulators. Naturally occurring non-tRNA substrates for aminoacylation may serve a similar purpose. Images PMID:7664744

  5. Diversified sequences of peptide epitope for same-RNA recognition.

    PubMed Central

    Kim, S; Ribas de Pouplana, L; Schimmel, P

    1993-01-01

    We replaced an essential RNA-binding, 30-amino acid helix-loop in an Escherichia coli tRNA synthetase with an inactive and simplified "generic" sequence having 23 of the 30 amino acids as alanine and serine. Wild-type residues were restored in random combinations to generate a library with a sequence complexity of about 1.9 x 10(7). Active molecules were obtained by genetic selection at a frequency of approximately 1% and contained variants with as many as 11 alanine/serine replacements and a total of 17 alanine/serine residues. These variants have activities which are thermodynamically competitive with that of the native protein and therefore are functionally and, most likely, conformationally equivalent. Images Fig. 2 Fig. 4 PMID:7694278

  6. Power analysis of single-cell RNA-sequencing experiments.

    PubMed

    Svensson, Valentine; Natarajan, Kedar Nath; Ly, Lam-Ha; Miragaia, Ricardo J; Labalette, Charlotte; Macaulay, Iain C; Cvejic, Ana; Teichmann, Sarah A

    2017-04-01

    Single-cell RNA sequencing (scRNA-seq) has become an established and powerful method to investigate transcriptomic cell-to-cell variation, thereby revealing new cell types and providing insights into developmental processes and transcriptional stochasticity. A key question is how the variety of available protocols compare in terms of their ability to detect and accurately quantify gene expression. Here, we assessed the protocol sensitivity and accuracy of many published data sets, on the basis of spike-in standards and uniform data processing. For our workflow, we developed a flexible tool for counting the number of unique molecular identifiers (https://github.com/vals/umis/). We compared 15 protocols computationally and 4 protocols experimentally for batch-matched cell populations, in addition to investigating the effects of spike-in molecular degradation. Our analysis provides an integrated framework for comparing scRNA-seq protocols.

  7. Experimental design, preprocessing, normalization and differential expression analysis of small RNA sequencing experiments

    PubMed Central

    2011-01-01

    Prior to the advent of new, deep sequencing methods, small RNA (sRNA) discovery was dependent on Sanger sequencing, which was time-consuming and limited knowledge to only the most abundant sRNA. The innovation of large-scale, next-generation sequencing has exponentially increased knowledge of the biology, diversity and abundance of sRNA populations. In this review, we discuss issues involved in the design of sRNA sequencing experiments, including choosing a sequencing platform, inherent biases that affect sRNA measurements and replication. We outline the steps involved in preprocessing sRNA sequencing data and review both the principles behind and the current options for normalization. Finally, we discuss differential expression analysis in the absence and presence of biological replicates. While our focus is on sRNA sequencing experiments, many of the principles discussed are applicable to the sequencing of other RNA populations. PMID:21356093

  8. Toward Rare Blood Cell Preservation for RNA Sequencing.

    PubMed

    Vickovic, Sanja; Ahmadian, Afshin; Lewensohn, Rolf; Lundeberg, Joakim

    2015-07-01

    Cancer is driven by various events leading to cell differentiation and disease progression. Molecular tools are powerful approaches for describing how and why these events occur. With the growing field of next-generation DNA sequencing, there is an increasing need for high-quality nucleic acids derived from human cells and tissues-a prerequisite for successful cell profiling. Although advances in RNA preservation have been made, some of the largest biobanks still do not employ RNA blood preservation as standard because of limitations in low blood-input volume and RNA stability over the whole gene body. Therefore, we have developed a robust protocol for blood preservation and long-term storage while maintaining RNA integrity. Furthermore, we explored the possibility of using the protocol for preserving rare cell samples, such as circulating tumor cells. The results of our study confirmed that gene expression was not impacted by the preservation procedure (r(2) > 0.88) or by long-term storage (r(2) = 0.95), with RNA integrity number values averaging over 8. Similarly, cell surface antigens were still available for antibody selection (r(2) = 0.95). Lastly, data mining for fusion events showed that it was possible to detect rare tumor cells among a background of other cells present in blood irrespective of fixation. Thus, the developed protocol would be suitable for rare blood cell preservation followed by RNA sequencing analysis. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  9. The distribution of RNA motifs in natural sequences.

    PubMed

    Bourdeau, V; Ferbeyre, G; Pageau, M; Paquin, B; Cedergren, R

    1999-11-15

    Functional analysis of genome sequences has largely ignored RNA genes and their structures. We introduce here the notion of 'ribonomics' to describe the search for the distribution of and eventually the determination of the physiological roles of these RNA structures found in the sequence databases. The utility of this approach is illustrated here by the identification in the GenBank database of RNA motifs having known binding or chemical activity. The frequency of these motifs indicates that most have originated from evolutionary drift and are selectively neutral. On the other hand, their distribution among species and their location within genes suggest that the destiny of these motifs may be more elaborate. For example, the hammerhead motif has a skewed organismal presence, is phylogenetically stable and recent work on a schistosome version confirms its in vivo biological activity. The under-representation of the valine-binding motif and the Rev-binding element in GenBank hints at a detrimental effect on cell growth or viability. Data on the presence and the location of these motifs may provide critical guidance in the design of experiments directed towards the understanding and the manipulation of RNA complexes and activities in vivo.

  10. RNA sequencing of archived neonatal dried blood spots.

    PubMed

    Bybjerg-Grauholm, Jonas; Hagen, Christian Munch; Khoo, Sok Kean; Johannesen, Maria Louise; Hansen, Christine Søholm; Bækvad-Hansen, Marie; Christiansen, Michael; Hougaard, David Michael; Hollegaard, Mads V

    2017-03-01

    Neonatal dried blood spots (DBS) are routinely collected on standard Guthrie cards for all-comprising national newborn screening programs for inborn errors of metabolism, hypothyroidism and other diseases. In Denmark, the Guthrie cards are stored at - 20 °C in the Danish Neonatal Screening Biobank and each sample is linked to elaborate social and medical registries. This provides a unique biospecimen repository to enable large population research at a perinatal level. Here, we demonstrate the feasibility to obtain gene expression data from DBS using next-generation RNA sequencing (RNA-seq). RNA-seq was performed on five males and five females. Sequencing results have an average of > 30 million reads per sample. 26,799 annotated features can be identified with 64% features detectable without fragments per kilobase of transcript per million mapped reads (FPKM) cutoff; number of detectable features dropped to 18% when FPKM ≥ 1. Sex can be discriminated using blood-based sex-specific gene set identified by the Genotype-Tissue Expression consortium. Here, we demonstrate the feasibility to acquire biologically-relevant gene expression from DBS using RNA-seq which provide a new avenue to investigate perinatal diseases in a high throughput manner.

  11. Statistical mechanics of secondary structures formed by random RNA sequences.

    PubMed

    Bundschuh, R; Hwa, T

    2002-03-01

    The formation of secondary structures by a random RNA sequence is studied as a model system for the sequence-structure problem omnipresent in biopolymers. Several toy energy models are introduced to allow detailed analytical and numerical studies. First, a two-replica calculation is performed. By mapping the two-replica problem to the denaturation of a single homogeneous RNA molecule in six-dimensional embedding space, we show that sequence disorder is perturbatively irrelevant, i.e., an RNA molecule with weak sequence disorder is in a molten phase where many secondary structures with comparable total energy coexist. A numerical study of various models at high temperature reproduces behaviors characteristic of the molten phase. On the other hand, a scaling argument based on the external statistics of rare regions can be constructed to show that the low-temperature phase is unstable to sequence disorder. We performed a detailed numerical study of the low-temperature phase using the droplet theory as a guide, and characterized the statistics of large-scale, low-energy excitations of the secondary structures from the ground state structure. We find the excitation energy to grow very slowly (i.e., logarithmically) with the length scale of the excitation, suggesting the existence of a marginal glass phase. The transition between the low-temperature glass phase and the high-temperature molten phase is also characterized numerically. It is revealed by a change in the coefficient of the logarithmic excitation energy, from being disorder dominated to being entropy dominated.

  12. Bacteriorhodopsin: partial sequence of mRNA provides amino acid sequence in the precursor region.

    PubMed Central

    Chang, S H; Majumdar, A; Dunn, R; Makabe, O; RajBhandary, U L; Khorana, H G; Ohtsuka, E; Tanaka, T; Taniyama, Y O; Ikehara, M

    1981-01-01

    mRNA for bacteriorhodopsin from Halobacterium halobium has been partially purified. By using this mRNA as template in the presence of reverse transcriptase RNA-dependent DNA nucleotidyltransferase and a 5'-[32P] synthetic oligodeoxyribonucleotide corresponding to amino acids 9-12 of bacteriorhodopsin as primer, we have isolated the major 5'-[32P]cDNA product, approximately 80 nucleotides long, and determined its sequence. Based on the cDNA sequence, the 5'-proximal sequence of bacteriorhodopsin mRNA is G-C-A-U-G-U-U-G-G-A-G-U-U-A-U-U-G-C-C-A-A-C-A-G-C-A-G-U-G-G-A-G-G-G-G-G-U-A-U-C -G-C-A-G-G-C-C-C-A-G-A-U-C-A-C-C-G-G-A-C-G-U-C-C-G. This includes the expected sequence for amino acids 1-8 and shows that bacteriorhodopsin is synthesized as a precursor that is at least 13 amino acids longer (Met-Leu-Glu-Leu-Leu-Pro-Thr-Ala-Val-Glu-Gly-Val-Ser) at the NH2 terminus. Agarose/urea gel electrophoresis of the partially purified mRNA showed several bands; of these, a major one hybridized with 5'-[32P]cDNA. These results suggest that the bacteriorhodopsin mRNA in the partially purified preparation is homogeneous in size and that it constitutes a substantial portion of the RNA preparation subjected to electrophoresis. Images PMID:6943548

  13. Bias detection and correction in RNA-Sequencing data.

    PubMed

    Zheng, Wei; Chung, Lisa M; Zhao, Hongyu

    2011-07-19

    High throughput sequencing technology provides us unprecedented opportunities to study transcriptome dynamics. Compared to microarray-based gene expression profiling, RNA-Seq has many advantages, such as high resolution, low background, and ability to identify novel transcripts. Moreover, for genes with multiple isoforms, expression of each isoform may be estimated from RNA-Seq data. Despite these advantages, recent work revealed that base level read counts from RNA-Seq data may not be randomly distributed and can be affected by local nucleotide composition. It was not clear though how the base level read count bias may affect gene level expression estimates. In this paper, by using five published RNA-Seq data sets from different biological sources and with different data preprocessing schemes, we showed that commonly used estimates of gene expression levels from RNA-Seq data, such as reads per kilobase of gene length per million reads (RPKM), are biased in terms of gene length, GC content and dinucleotide frequencies. We directly examined the biases at the gene-level, and proposed a simple generalized-additive-model based approach to correct different sources of biases simultaneously. Compared to previously proposed base level correction methods, our method reduces bias in gene-level expression estimates more effectively. Our method identifies and corrects different sources of biases in gene-level expression measures from RNA-Seq data, and provides more accurate estimates of gene expression levels from RNA-Seq. This method should prove useful in meta-analysis of gene expression levels using different platforms or experimental protocols.

  14. Cryptic anuran biodiversity in Bangladesh revealed by mitochondrial 16S rRNA gene sequences.

    PubMed

    Hasan, Mahmudul; Islam, Mohammed Mafizul; Khan, Mukhlesur Rahman; Alam, Mohammad Shafiqul; Kurabayashi, Atsushi; Igawa, Takeshi; Kuramoto, Mitsuru; Sumida, Masayuki

    2012-03-01

    To survey the diversity of anuran species in Bangladesh, we compared mitochondrial 16S rRNA gene sequences (approximately 1.4 kbp) from 107 Bangladesh frog specimens. The results of genetic divergence and phylogenetic analyses incorporating data from related species revealed the occurrence of at least eight cryptic species. Hoplobatrachus tigerinus from two districts diverged considerably, indicating the involvement of a cryptic species. Two Fejervarya sp. (large and medium types) and Hylarana cf. taipehensis formed lineages distinct from related species and are probably new species. Microhyla cf. ornata differed from M. ornata with respect to type locality area and involved two distinct species. In addition, we found that Hylarana sp. and Microhyla sp. did not match congeners examined to date in either morphology or 16S rRNA sequence. The occurrence of M. fissipes was tentatively suggested. Consequently, at least, 19 species were found from Bangladesh in this study. These findings revealed a rich anuran biodiversity in Bangladesh, which is unexpected considering the rather simple topographic features of the country.

  15. tRNA-Related Sequences Trigger Systemic mRNA Transport in Plants[OPEN

    PubMed Central

    Zhang, Wenna; Kollwig, Gregor; Apelt, Federico; Walther, Dirk

    2016-01-01

    In plants, protein-coding mRNAs can move via the phloem vasculature to distant tissues, where they may act as non-cell-autonomous signals. Emerging work has identified many phloem-mobile mRNAs, but little is known regarding RNA motifs triggering mobility, the extent of mRNA transport, and the potential of transported mRNAs to be translated into functional proteins after transport. To address these aspects, we produced reporter transcripts harboring tRNA-like structures (TLSs) that were found to be enriched in the phloem stream and in mRNAs moving over chimeric graft junctions. Phenotypic and enzymatic assays on grafted plants indicated that mRNAs harboring a distinctive TLS can move from transgenic roots into wild-type leaves and from transgenic leaves into wild-type flowers or roots; these mRNAs can also be translated into proteins after transport. In addition, we provide evidence that dicistronic mRNA:tRNA transcripts are frequently produced in Arabidopsis thaliana and are enriched in the population of graft-mobile mRNAs. Our results suggest that tRNA-derived sequences with predicted stem-bulge-stem-loop structures are sufficient to mediate mRNA transport and seem to be necessary for the mobility of a large number of endogenous transcripts that can move through graft junctions. PMID:27268430

  16. Adenylylation of small RNA sequencing adapters using the TS2126 RNA ligase I.

    PubMed

    Lama, Lodoe; Ryan, Kevin

    2016-01-01

    Many high-throughput small RNA next-generation sequencing protocols use 5' preadenylylated DNA oligonucleotide adapters during cDNA library preparation. Preadenylylation of the DNA adapter's 5' end frees from ATP-dependence the ligation of the adapter to RNA collections, thereby avoiding ATP-dependent side reactions. However, preadenylylation of the DNA adapters can be costly and difficult. The currently available method for chemical adenylylation of DNA adapters is inefficient and uses techniques not typically practiced in laboratories profiling cellular RNA expression. An alternative enzymatic method using a commercial RNA ligase was recently introduced, but this enzyme works best as a stoichiometric adenylylating reagent rather than a catalyst and can therefore prove costly when several variant adapters are needed or during scale-up or high-throughput adenylylation procedures. Here, we describe a simple, scalable, and highly efficient method for the 5' adenylylation of DNA oligonucleotides using the thermostable RNA ligase 1 from bacteriophage TS2126. Adapters with 3' blocking groups are adenylylated at >95% yield at catalytic enzyme-to-adapter ratios and need not be gel purified before ligation to RNA acceptors. Experimental conditions are also reported that enable DNA adapters with free 3' ends to be 5' adenylylated at >90% efficiency. © 2015 Lama and Ryan; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  17. Sequence and expression of ferredoxin mRNA in barley

    SciTech Connect

    Zielinski, R.; Funder, P.M.; Ling, V. )

    1990-05-01

    We have isolated and structurally characterized a full-length cDNA clone encoding ferredoxin from a {lambda}gt10 cDNA library prepared from barley leaf mRNA. The ferredoxin clone (pBFD-1) was fused head-to-head with a partial-length cDNA clone encoding calmodulin, and was fortuitously isolated by screening the library with a calmodulin-specific oligonucleotide probe. The mRNA sequence from which pBFD-1 was derived is expressed exclusively in the leaf tissues of 7-d old barley seedlings. Barley pre-ferredoxin has a predicted size of 15.3 kDal, of which 4.6 kDal are accounted for by the transit peptide. The polypeptide encoded by pBFD-1 is identical to wheat ferredoxin, and shares slightly more amino acid sequence similarity with spinach ferredoxin I than with ferredoxin II. Ferredoxin mRNA levels are rapidly increased 10-fold by white light in etiolated barley leaves.

  18. RNA sequence and secondary structure participate in high-affinity CsrA-RNA interaction.

    PubMed

    Dubey, Ashok K; Baker, Carol S; Romeo, Tony; Babitzke, Paul

    2005-10-01

    The global Csr regulatory system controls bacterial gene expression post-transcriptionally. CsrA of Escherichia coli is an RNA binding protein that plays a central role in repressing several stationary phase processes and activating certain exponential phase functions. CsrA regulates translation initiation of several genes by binding to the mRNA leaders and blocking ribosome binding. CsrB and CsrC are noncoding regulatory RNAs that are capable of sequestering CsrA and antagonizing its activity. Each of the known target transcripts contains multiple CsrA binding sites, although considerable sequence variation exists among these RNA targets, with GGA being the most highly conserved element. High-affinity RNA ligands containing single CsrA binding sites were identified from a combinatorial library using systematic evolution of ligands by exponential enrichment (SELEX). The SELEX-derived consensus was determined as RUACARGGAUGU, with the ACA and GGA motifs being 100% conserved and the GU sequence being present in all but one ligand. The majority (51/55) of the RNAs contained GGA in the loop of a hairpin within the most stable predicted structure, an arrangement similar to several natural CsrA binding sites. Strikingly, the identity of several nucleotides that were predicted to form base pairs in each stem were 100% conserved, suggesting that primary sequence information was embedded within the base-paired region. The affinity of CsrA for several selected ligands was measured using quantitative gel mobility shift assays. A mutational analysis of one selected ligand confirmed that the conserved ACA, GGA, and GU residues were critical for CsrA binding and that RNA secondary structure participates in CsrA-RNA recognition.

  19. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    DTIC Science & Technology

    2015-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. PRINCIPAL INVESTIGATOR...Aug 2015 4. TITLE AND SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. 5a. CONTRACT NUMBER 5b. GRANT...analysis of expression data on the entire range of informative RNA categories, including mRNA, miRNA, and lncRNA, as well as their splice variants

  20. How to analyze gene expression using RNA-sequencing data.

    PubMed

    Ramsköld, Daniel; Kavak, Ersen; Sandberg, Rickard

    2012-01-01

    RNA-Seq is arising as a powerful method for transcriptome analyses that will eventually make microarrays obsolete for gene expression analyses. Improvements in high-throughput sequencing and efficient sample barcoding are now enabling tens of samples to be run in a cost-effective manner, competing with microarrays in price, excelling in performance. Still, most studies use microarrays, partly due to the ease of data analyses using programs and modules that quickly turn raw microarray data into spreadsheets of gene expression values and significant differentially expressed genes. Instead RNA-Seq data analyses are still in its infancy and the researchers are facing new challenges and have to combine different tools to carry out an analysis. In this chapter, we provide a tutorial on RNA-Seq data analysis to enable researchers to quantify gene expression, identify splice junctions, and find novel transcripts using publicly available software. We focus on the analyses performed in organisms where a reference genome is available and discuss issues with current methodology that have to be solved before RNA-Seq data can utilize its full potential.

  1. Building an RNA Sequencing Transcriptome of the Central Nervous System

    PubMed Central

    Dong, Xiaomin; You, Yanan; Wu, Jia Qian

    2015-01-01

    The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types. PMID:26463470

  2. Chaining sequence/structure seeds for computing RNA similarity.

    PubMed

    Bourgeade, Laetitia; Chauve, Cédric; Allali, Julien

    2015-03-01

    We describe a new method to compare a query RNA with a static set of target RNAs. Our method is based on (i) a static indexing of the sequence/structure seeds of the target RNAs; (ii) searching the target RNAs by detecting seeds of the query present in the target, chaining these seeds in promising candidate homologs; and then (iii) completing the alignment using an anchor-based exact alignment algorithm. We apply our method on the benchmark Bralibase2.1 and compare its accuracy and efficiency with the exact method LocARNA and its recent seeds-based speed-up ExpLoc-P. Our pipeline RNA-unchained greatly improves computation time of LocARNA and is comparable to the one of ExpLoc-P, while improving the overall accuracy of the final alignments.

  3. Optimizing RNA structures by sequence extensions using RNAcop

    PubMed Central

    Hecker, Nikolai; Christensen-Dalsgaard, Mikkel; Seemann, Stefan E.; Havgaard, Jakob H.; Stadler, Peter F.; Hofacker, Ivo L.; Nielsen, Henrik; Gorodkin, Jan

    2015-01-01

    A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop. PMID:26283181

  4. Prediction of uridine modifications in tRNA sequences.

    PubMed

    Panwar, Bharat; Raghava, Gajendra P S

    2014-10-02

    In past number of methods have been developed for predicting post-translational modifications in proteins. In contrast, limited attempt has been made to understand post-transcriptional modifications. Recently it has been shown that tRNA modifications play direct role in the genome structure and codon usage. This study is an attempt to understand kingdom-wise tRNA modifications particularly uridine modifications (UMs), as majority of modifications are uridine-derived. A three-steps strategy has been applied to develop an efficient method for the prediction of UMs. In the first step, we developed a common prediction model for all the kingdoms using a dataset from MODOMICS-2008. Support Vector Machine (SVM) based prediction models were developed and evaluated by five-fold cross-validation technique. Different approaches were applied and found that a hybrid approach of binary and structural information achieved highest Area under the curve (AUC) of 0.936. In the second step, we used newly added tRNA sequences (as independent dataset) of MODOMICS-2012 for the kingdom-wise prediction performance evaluation of previously developed (in the first step) common model and achieved performances between the AUC of 0.910 to 0.949. In the third and last step, we used different datasets from MODOMICS-2012 for the kingdom-wise individual prediction models development and achieved performances between the AUC of 0.915 to 0.987. The hybrid approach is efficient not only to predict kingdom-wise modifications but also to classify them into two most prominent UMs: Pseudouridine (Y) and Dihydrouridine (D). A webserver called tRNAmod (http://crdd.osdd.net/raghava/trnamod/) has been developed, which predicts UMs from both tRNA sequences and whole genome.

  5. Legume genomics: understanding biology through DNA and RNA sequencing

    PubMed Central

    O'Rourke, Jamie A.; Bolon, Yung-Tsi; Bucciarelli, Bruna; Vance, Carroll P.

    2014-01-01

    Background The legume family (Leguminosae) consists of approx. 17 000 species. A few of these species, including, but not limited to, Phaseolus vulgaris, Cicer arietinum and Cajanus cajan, are important dietary components, providing protein for approx. 300 million people worldwide. Additional species, including soybean (Glycine max) and alfalfa (Medicago sativa), are important crops utilized mainly in animal feed. In addition, legumes are important contributors to biological nitrogen, forming symbiotic relationships with rhizobia to fix atmospheric N2 and providing up to 30 % of available nitrogen for the next season of crops. The application of high-throughput genomic technologies including genome sequencing projects, genome re-sequencing (DNA-seq) and transcriptome sequencing (RNA-seq) by the legume research community has provided major insights into genome evolution, genomic architecture and domestication. Scope and Conclusions This review presents an overview of the current state of legume genomics and explores the role that next-generation sequencing technologies play in advancing legume genomics. The adoption of next-generation sequencing and implementation of associated bioinformatic tools has allowed researchers to turn each species of interest into their own model organism. To illustrate the power of next-generation sequencing, an in-depth overview of the transcriptomes of both soybean and white lupin (Lupinus albus) is provided. The soybean transcriptome focuses on analysing seed development in two near-isogenic lines, examining the role of transporters, oil biosynthesis and nitrogen utilization. The white lupin transcriptome analysis examines how phosphate deficiency alters gene expression patterns, inducing the formation of cluster roots. Such studies illustrate the power of next-generation sequencing and bioinformatic analyses in elucidating the gene networks underlying biological processes. PMID:24769535

  6. Determining breast cancer histological grade from RNA-sequencing data.

    PubMed

    Wang, Mei; Klevebring, Daniel; Lindberg, Johan; Czene, Kamila; Grönberg, Henrik; Rantalainen, Mattias

    2016-05-10

    The histologic grade (HG) of breast cancer is an established prognostic factor. The grade is usually reported on a scale ranging from 1 to 3, where grade 3 tumours are the most aggressive. However, grade 2 is associated with an intermediate risk of recurrence, and carries limited information for clinical decision-making. Patients classified as grade 2 are at risk of both under- and over-treatment. RNA-sequencing analysis was conducted in a cohort of 275 women diagnosed with invasive breast cancer. Multivariate prediction models were developed to classify tumours into high and low transcriptomic grade (TG) based on gene- and isoform-level expression data from RNA-sequencing. HG2 tumours were reclassified according to the prediction model and a recurrence-free survival analysis was performed by the multivariate Cox proportional hazards regression model to assess to what extent the TG model could be used to stratify patients. The prediction model was validated in N=487 breast cancer cases from the The Cancer Genome Atlas (TCGA) data set. Differentially expressed genes and isoforms associated with HGs were analysed using linear models. The classification of grade 1 and grade 3 tumours based on RNA-sequencing data achieved high accuracy (area under the receiver operating characteristic curve = 0.97). The association between recurrence-free survival rate and HGs was confirmed in the study population (hazard ratio of grade 3 versus 1 was 2.62 with 95 % confidence interval = 1.04-6.61). The TG model enabled us to reclassify grade 2 tumours as high TG and low TG gene or isoform grade. The risk of recurrence in the high TG group of grade 2 tumours was higher than in low TG group (hazard ratio = 2.43, 95 % confidence interval = 1.13-5.20). We found 8200 genes and 13,809 isoforms that were differentially expressed between HG1 and HG3 breast cancer tumours. Gene- and isoform-level expression data from RNA-sequencing could be utilised to differentiate HG1 and HG3 tumours with

  7. Approaches to sequence analysis of 125I-labeled RNA.

    PubMed Central

    Dickson, E; Pape, L K; Robertson, H D

    1979-01-01

    A method is described for the initial steps of sequence analysis of RNase T1-and pancreatic RN-ase-resistant oligonucleotides of RNA containing cytidylate residues labeled in vitro with 125I. In many cases an oligonucleotide sequence can be deduced from a consideration of (i) its relative position in the two-dimensional fingerprint (with DEAE thin layer homochromatographic second dimension), (ii) its electrophoretic mobility on DEAE paper at pH 1.9, and (iii) identification of its products of further enzymatic digestion by comparison with a set of marker oligonucleotides. Additional methods including analysis of oligonucleotides following chemical blocking of uridylate residues with CMCT and analysis of products of incomplete enzymatic digestion are also discussed. Images PMID:106369

  8. Tetramerization of an RNA oligonucleotide containing a GGGG sequence.

    PubMed

    Kim, J; Cheong, C; Moore, P B

    1991-05-23

    Poly rG can form four-stranded helices. The Hoogsteen-paired quartets of G residues on which such structures depend are so stable that they will form in 5'-GMP solutions, provided that Na+ or K+ are present (see for example, refs 2-4). Telomeric DNA sequences, which are G-rich, adopt four-stranded antiparallel G-quartet conformations in vitro, and parallel tetramerization of G-rich sequences may be involved in meiosis. Here we show that RNAs containing short runs of Gs can also tetramerize. A 19-base oligonucleotide derived from the 5S RNA of Escherichia coli (strand III), 5'GCCGAUGGUAGUGUGGGGU3', forms a K(+)-stabilized tetrameric aggregate that depends on the G residues at its 3' end. This complex is so stable that it would be surprising if similar structures do not occur in nature.

  9. Gene regulation: ancient microRNA target sequences in plants.

    PubMed

    Floyd, Sandra K; Bowman, John L

    2004-04-01

    MicroRNAs are an abundant class of small RNAs that are thought to regulate the expression of protein-coding genes in plants and animals. Here we show that the target sequence of two microRNAs, known to regulate genes in the class-III homeodomain-leucine zipper (HD-Zip) gene family of the flowering plant Arabidopsis, is conserved in homologous sequences from all lineages of land plants, including bryophytes, lycopods, ferns and seed plants. We also find that the messenger RNAs from these genes are cleaved within the same microRNA-binding site in representatives of each land-plant group, as they are in Arabidopsis. Our results indicate not only that microRNAs mediate gene regulation in non-flowering as well as flowering plants, but also that the regulation of this class of plant genes dates back more than 400 million years.

  10. Translational regulation of human beta interferon mRNA: association of the 3' AU-rich sequence with the poly(A) tail reduces translation efficiency in vitro.

    PubMed Central

    Grafi, G; Sela, I; Galili, G

    1993-01-01

    The 3' AU-rich region of human beta-1 interferon (hu-IFN beta) mRNA was found to act as a translational inhibitory element. The translational regulation of this 3' AU-rich sequence and the effect of its association with the poly(A) tail were studied in cell-free rabbit reticulocyte lysate. A poly(A)-rich hu-IFN beta mRNA (110 A residues) served as an inefficient template for protein synthesis. However, translational efficiency was considerably improved when the poly(A) tract was shortened (11 A residues) or when the 3' AU-rich sequence was deleted, indicating that interaction between these two regions was responsible for the reduced translation of the poly(A)-rich hu-IFN beta mRNA. Differences in translational efficiency of the various hu-IFN beta mRNAs correlated well with their polysomal distribution. The poly(A)-rich hu-IFN beta mRNA failed to form large polysomes, while its counterpart bearing a short poly(A) tail was recruited more efficiently into large polysomes. The AU-rich sequence-binding activity was reduced when the RNA probe contained both the 3' AU-rich sequence and long poly(A) tail, supporting a physical association between these two regions. Further evidence for this interaction was achieved by RNase H protection assay. We suggest that the 3' AU-rich sequence may regulate the translation of hu-IFN beta mRNA by interacting with the poly(A) tail. Images PMID:7684500

  11. Assessing long-distance RNA sequence connectivity via RNA-templated DNA–DNA ligation

    PubMed Central

    Roy, Christian K; Olson, Sara; Graveley, Brenton R; Zamore, Phillip D; Moore, Melissa J

    2015-01-01

    Many RNAs, including pre-mRNAs and long non-coding RNAs, can be thousands of nucleotides long and undergo complex post-transcriptional processing. Multiple sites of alternative splicing within a single gene exponentially increase the number of possible spliced isoforms, with most human genes currently estimated to express at least ten. To understand the mechanisms underlying these complex isoform expression patterns, methods are needed that faithfully maintain long-range exon connectivity information in individual RNA molecules. In this study, we describe SeqZip, a methodology that uses RNA-templated DNA–DNA ligation to retain and compress connectivity between distant sequences within single RNA molecules. Using this assay, we test proposed coordination between distant sites of alternative exon utilization in mouse Fn1, and we characterize the extraordinary exon diversity of Drosophila melanogaster Dscam1. DOI: http://dx.doi.org/10.7554/eLife.03700.001 PMID:25866926

  12. Long Non-Coding RNA and Alternative Splicing Modulations in Parkinson's Leukocytes Identified by RNA Sequencing

    PubMed Central

    Soreq, Lilach; Guffanti, Alessandro; Salomonis, Nathan; Simchovitz, Alon; Israel, Zvi; Bergman, Hagai; Soreq, Hermona

    2014-01-01

    The continuously prolonged human lifespan is accompanied by increase in neurodegenerative diseases incidence, calling for the development of inexpensive blood-based diagnostics. Analyzing blood cell transcripts by RNA-Seq is a robust means to identify novel biomarkers that rapidly becomes a commonplace. However, there is lack of tools to discover novel exons, junctions and splicing events and to precisely and sensitively assess differential splicing through RNA-Seq data analysis and across RNA-Seq platforms. Here, we present a new and comprehensive computational workflow for whole-transcriptome RNA-Seq analysis, using an updated version of the software AltAnalyze, to identify both known and novel high-confidence alternative splicing events, and to integrate them with both protein-domains and microRNA binding annotations. We applied the novel workflow on RNA-Seq data from Parkinson's disease (PD) patients' leukocytes pre- and post- Deep Brain Stimulation (DBS) treatment and compared to healthy controls. Disease-mediated changes included decreased usage of alternative promoters and N-termini, 5′-end variations and mutually-exclusive exons. The PD regulated FUS and HNRNP A/B included prion-like domains regulated regions. We also present here a workflow to identify and analyze long non-coding RNAs (lncRNAs) via RNA-Seq data. We identified reduced lncRNA expression and selective PD-induced changes in 13 of over 6,000 detected leukocyte lncRNAs, four of which were inversely altered post-DBS. These included the U1 spliceosomal lncRNA and RP11-462G22.1, each entailing sequence complementarity to numerous microRNAs. Analysis of RNA-Seq from PD and unaffected controls brains revealed over 7,000 brain-expressed lncRNAs, of which 3,495 were co-expressed in the leukocytes including U1, which showed both leukocyte and brain increases. Furthermore, qRT-PCR validations confirmed these co-increases in PD leukocytes and two brain regions, the amygdala and substantia

  13. SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

    PubMed

    Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

    2017-01-01

    Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.

  14. Nascent RNA sequencing reveals distinct features in plant transcription

    PubMed Central

    Hetzel, Jonathan; Duttke, Sascha H.; Benner, Christopher; Chory, Joanne

    2016-01-01

    Transcriptional regulation of gene expression is a major mechanism used by plants to confer phenotypic plasticity, and yet compared with other eukaryotes or bacteria, little is known about the design principles. We generated an extensive catalog of nascent and steady-state transcripts in Arabidopsis thaliana seedlings using global nuclear run-on sequencing (GRO-seq), 5′GRO-seq, and RNA-seq and reanalyzed published maize data to capture characteristics of plant transcription. De novo annotation of nascent transcripts accurately mapped start sites and unstable transcripts. Examining the promoters of coding and noncoding transcripts identified comparable chromatin signatures, a conserved “TGT” core promoter motif and unreported transcription factor-binding sites. Mapping of engaged RNA polymerases showed a lack of enhancer RNAs, promoter-proximal pausing, and divergent transcription in Arabidopsis seedlings and maize, which are commonly present in yeast and humans. In contrast, Arabidopsis and maize genes accumulate RNA polymerases in proximity of the polyadenylation site, a trend that coincided with longer genes and CpG hypomethylation. Lack of promoter-proximal pausing and a higher correlation of nascent and steady-state transcripts indicate Arabidopsis may regulate transcription predominantly at the level of initiation. Our findings provide insight into plant transcription and eukaryotic gene expression as a whole. PMID:27729530

  15. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences

    PubMed Central

    Laslett, Dean; Canback, Bjorn

    2004-01-01

    A computer program, ARAGORN, identifies tRNA and tmRNA genes. The program employs heuristic algorithms to predict tRNA secondary structure, based on homology with recognized tRNA consensus sequences and ability to form a base-paired cloverleaf. tmRNA genes are identified using a modified version of the BRUCE program. ARAGORN achieves a detection sensitivity of 99% from a set of 1290 eubacterial, eukaryotic and archaeal tRNA genes and detects all complete tmRNA sequences in the tmRNA database, improving on the performance of the BRUCE program. Recently discovered tmRNA genes in the chloroplasts of two species from the ‘green’ algae lineage are detected. The output of the program reports the proposed tRNA secondary structure and, for tmRNA genes, the secondary structure of the tRNA domain, the tmRNA gene sequence, the tag peptide and a list of organisms with matching tmRNA peptide tags. PMID:14704338

  16. Analysis of sequencing data for probing RNA secondary structures and protein-RNA binding in studying posttranscriptional regulations.

    PubMed

    Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y

    2016-11-01

    High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations.

  17. Use of Unamplified RNA/cDNA–Hybrid Nanopore Sequencing for Rapid Detection and Characterization of RNA Viruses

    PubMed Central

    Kilianski, Andy; Roth, Pierce A.; Liem, Alvin T.; Hill, Jessica M.; Willis, Kristen L.; Rossmaier, Rebecca D.; Marinich, Andrew V.; Maughan, Michele N.; Karavis, Mark A.; Kuhn, Jens H.; Honko, Anna N.

    2016-01-01

    Nanopore sequencing, a novel genomics technology, has potential applications for routine biosurveillance, clinical diagnosis, and outbreak investigation of virus infections. Using rapid sequencing of unamplified RNA/cDNA hybrids, we identified Venezuelan equine encephalitis virus and Ebola virus in 3 hours from sample receipt to data acquisition, demonstrating a fieldable technique for RNA virus characterization. PMID:27191483

  18. Assessing the 5S ribosomal RNA heterogeneity in Arabidopsis thaliana using short RNA next generation sequencing data.

    PubMed

    Szymanski, Maciej; Karlowski, Wojciech M

    2016-01-01

    In eukaryotes, ribosomal 5S rRNAs are products of multigene families organized within clusters of tandemly repeated units. Accumulation of genomic data obtained from a variety of organisms demonstrated that the potential 5S rRNA coding sequences show a large number of variants, often incompatible with folding into a correct secondary structure. Here, we present results of an analysis of a large set of short RNA sequences generated by the next generation sequencing techniques, to address the problem of heterogeneity of the 5S rRNA transcripts in Arabidopsis and identification of potentially functional rRNA-derived fragments.

  19. Improved definition of the mouse transcriptome via targeted RNA sequencing.

    PubMed

    Bussotti, Giovanni; Leonardi, Tommaso; Clark, Michael B; Mercer, Tim R; Crawford, Joanna; Malquori, Lorenzo; Notredame, Cedric; Dinger, Marcel E; Mattick, John S; Enright, Anton J

    2016-05-01

    Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data. We applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. We assemble more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. We describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, we validate our observations using current FANTOM and Mouse ENCODE resources. © 2016 Bussotti et al.; Published by Cold Spring Harbor Laboratory Press.

  20. In vitro DNA dependent synthesis of globin RNA sequences from erythroleukemic cell chromatin.

    PubMed

    Reff, M E; Davidson, R L

    1979-01-01

    Murine erythroleukemic cells in culture accumulate cytoplasmic globin mRNA during differentiation induced by dimethyl sulfoxide (DMSO)1. Chromatin was prepared from DMSO induced erythroleukemic cells that were transcribing globin RNA in order to determine whether in vitro synthesis of globin RNA sequences was possible from chromatin. RNA was synthesized in vitro using 5-mercuriuridine triphosphate and exogenous Escheria coli RNA polymerase. Newly synthesized mercurated RNA was purified from endogenous chromatin associated RNA by affinity chromatography on a sepharose sulfhydryl column, and the globin RNA sequence content of the mercurated RNA was assayed by hybridization to cDNA globin. The synthesis of globin RNA sequences was shown to occur and to be sensitive to actinomycin and rifampicin and insensitive to alpha-amanitin. In contrast, synthesis of globin RNA sequence synthesis was not detected in significant amounts from chromatin prepared from uninduced erythroleukemic cells, nor from uninduced cell chromatin to which globin RNA was added prior to transcription. Isolated RNA:cDNA globin hybrids were shown to contain mercurated RNA by affinity chromatography. These results indicated that synthesis of globin RNA sequences from chromatin can be performed by E. coli RNA polymerase.

  1. In vitro DNA dependent synthesis of globin RNA sequences from erythroleukemic cell chromatin.

    PubMed Central

    Reff, M E; Davidson, R L

    1979-01-01

    Murine erythroleukemic cells in culture accumulate cytoplasmic globin mRNA during differentiation induced by dimethyl sulfoxide (DMSO)1. Chromatin was prepared from DMSO induced erythroleukemic cells that were transcribing globin RNA in order to determine whether in vitro synthesis of globin RNA sequences was possible from chromatin. RNA was synthesized in vitro using 5-mercuriuridine triphosphate and exogenous Escheria coli RNA polymerase. Newly synthesized mercurated RNA was purified from endogenous chromatin associated RNA by affinity chromatography on a sepharose sulfhydryl column, and the globin RNA sequence content of the mercurated RNA was assayed by hybridization to cDNA globin. The synthesis of globin RNA sequences was shown to occur and to be sensitive to actinomycin and rifampicin and insensitive to alpha-amanitin. In contrast, synthesis of globin RNA sequence synthesis was not detected in significant amounts from chromatin prepared from uninduced erythroleukemic cells, nor from uninduced cell chromatin to which globin RNA was added prior to transcription. Isolated RNA:cDNA globin hybrids were shown to contain mercurated RNA by affinity chromatography. These results indicated that synthesis of globin RNA sequences from chromatin can be performed by E. coli RNA polymerase. PMID:284320

  2. Deciphering Poxvirus Gene Expression by RNA Sequencing and Ribosome Profiling

    PubMed Central

    Cao, Shuai; Martens, Craig A.; Porcella, Stephen F.; Xie, Zhi; Ma, Ming; Shen, Ben

    2015-01-01

    ABSTRACT The more than 200 closely spaced annotated open reading frames, extensive transcriptional read-through, and numerous unpredicted RNA start sites have made the analysis of vaccinia virus gene expression challenging. Genome-wide ribosome profiling provided an unprecedented assessment of poxvirus gene expression. By 4 h after infection, approximately 80% of the ribosome-associated mRNA was viral. Ribosome-associated mRNAs were detected for most annotated early genes at 2 h and for most intermediate and late genes at 4 and 8 h. Cluster analysis identified a subset of early mRNAs that continued to be translated at the later times. At 2 h, there was excellent correlation between the abundance of individual mRNAs and the numbers of associated ribosomes, indicating that expression was primarily transcriptionally regulated. However, extensive transcriptional read-through invalidated similar correlations at later times. The mRNAs with the highest density of ribosomes had host response, DNA replication, and transcription roles at early times and were virion components at late times. Translation inhibitors were used to map initiation sites at single-nucleotide resolution at the start of most annotated open reading frames although in some cases a downstream methionine was used instead. Additional putative translational initiation sites with AUG or alternative codons occurred mostly within open reading frames, and fewer occurred in untranslated leader sequences, antisense strands, and intergenic regions. However, most open reading frames associated with these additional translation initiation sites were short, raising questions regarding their biological roles. The data were used to construct a high-resolution genome-wide map of the vaccinia virus translatome. IMPORTANCE This report contains the first genome-wide, high-resolution analysis of poxvirus gene expression at both transcriptional and translational levels. The study was made possible by recent methodological

  3. Transcriptional profiling of bovine milk using RNA sequencing

    PubMed Central

    2012-01-01

    Background Cow milk is a complex bioactive fluid consumed by humans beyond infancy. Even though the chemical and physical properties of cow milk are well characterized, very limited research has been done on characterizing the milk transcriptome. This study performs a comprehensive expression profiling of genes expressed in milk somatic cells of transition (day 15), peak (day 90) and late (day 250) lactation Holstein cows by RNA sequencing. Milk samples were collected from Holstein cows at 15, 90 and 250 days of lactation, and RNA was extracted from the pelleted milk cells. Gene expression analysis was conducted by Illumina RNA sequencing. Sequence reads were assembled and analyzed in CLC Genomics Workbench. Gene Ontology (GO) and pathway analysis were performed using the Blast2GO program and GeneGo application of MetaCore program. Results A total of 16,892 genes were expressed in transition lactation, 19,094 genes were expressed in peak lactation and 18,070 genes were expressed in late lactation. Regardless of the lactation stage approximately 9,000 genes showed ubiquitous expression. Genes encoding caseins, whey proteins and enzymes in lactose synthesis pathway showed higher expression in early lactation. The majority of genes in the fat metabolism pathway had high expression in transition and peak lactation milk. Most of the genes encoding for endogenous proteases and enzymes in ubiquitin-proteasome pathway showed higher expression along the course of lactation. Conclusions This is the first study to describe the comprehensive bovine milk transcriptome in Holstein cows. The results revealed that 69% of NCBI Btau 4.0 annotated genes are expressed in bovine milk somatic cells. Most of the genes were ubiquitously expressed in all three stages of lactation. However, a fraction of the milk transcriptome has genes devoted to specific functions unique to the lactation stage. This indicates the ability of milk somatic cells to adapt to different molecular functions

  4. A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing

    PubMed Central

    Landgraf, Pablo; Rusu, Mirabela; Sheridan, Robert; Sewer, Alain; Iovino, Nicola; Aravin, Alexei; Pfeffer, Sébastien; Rice, Amanda; Kamphorst, Alice O.; Landthaler, Markus; Lin, Carolina; Socci, Nicholas D.; Hermida, Leandro; Fulci, Valerio; Chiaretti, Sabina; Foà, Robin; Schliwka, Julia; Fuchs, Uta; Novosel, Astrid; Müller, Roman-Ulrich; Schermer, Bernhard; Bissels, Ute; Inman, Jason; Phan, Quang; Chien, Minchen; Weir, David B.; Choksi, Ruchi; De Vita, Gabriella; Frezzetti, Daniela; Trompeter, Hans-Ingo; Hornung, Veit; Teng, Grace; Hartmann, Gunther; Palkovits, Miklos; Di Lauro, Roberto; Wernet, Peter; Macino, Giuseppe; Rogler, Charles E.; Nagle, James W.; Ju, Jingyue; Papavasiliou, F. Nina; Benzing, Thomas; Lichter, Peter; Tam, Wayne; Brownstein, Michael J.; Bosio, Andreas; Borkhardt, Arndt; Russo, James J.; Sander, Chris; Zavolan, Mihaela; Tuschl, Thomas

    2007-01-01

    Summary MicroRNAs (miRNAs) are small non-coding regulatory RNAs that reduce stability and/or translation of fully or partially sequence-complementary target mRNAs. In order to identify miRNAs and to assess their expression patterns, we sequenced over 250 small RNA libraries from 26 different organ systems and cell types of human and rodents, enriched in neuronal as well as normal and malignant hematopoietic cells and tissues. We present expression profiles derived from clone count data and provide novel computational tools for their analysis. Unexpectedly, a relatively small set of miRNAs, many of which are ubiquitously expressed, account for most of the difference in miRNA profiles between cell lineages and tissues. This broad survey also provides detailed and accurate information about mature sequences, precursors, genome locations, maturation processes, inferred transcriptional units and conservation patterns. We also propose a subclassification scheme for miRNAs for assisting future experimental and computational functional analyses. PMID:17604727

  5. RNA expression profile of calcified bicuspid, tricuspid, and normal human aortic valves by RNA sequencing.

    PubMed

    Guauque-Olarte, Sandra; Droit, Arnaud; Tremblay-Marchand, Joël; Gaudreault, Nathalie; Kalavrouziotis, Dimitri; Dagenais, Francois; Seidman, Jonathan G; Body, Simon C; Pibarot, Philippe; Mathieu, Patrick; Bossé, Yohan

    2016-10-01

    The molecular mechanisms leading to premature development of aortic valve stenosis (AS) in individuals with a bicuspid aortic valve are unknown. The objective of this study was to identify genes differentially expressed between calcified bicuspid aortic valves (BAVc) and tricuspid valves with (TAVc) and without (TAVn) AS using RNA sequencing (RNA-Seq). We collected 10 human BAVc and nine TAVc from men who underwent primary aortic valve replacement. Eight TAVn were obtained from men who underwent heart transplantation. mRNA levels were measured by RNA-Seq and compared between valve groups. Two genes were upregulated, and none were downregulated in BAVc compared with TAVc, suggesting a similar gene expression response to AS in individuals with bicuspid and tricuspid valves. There were 462 genes upregulated and 282 downregulated in BAVc compared with TAVn. In TAVc compared with TAVn, 329 genes were up- and 170 were downregulated. A total of 273 upregulated and 147 downregulated genes were concordantly altered between BAVc vs. TAVn and TAVc vs. TAVn, which represent 56 and 84% of significant genes in the first and second comparisons, respectively. This indicates that extra genes and pathways were altered in BAVc. Shared pathways between calcified (BAVc and TAVc) and normal (TAVn) aortic valves were also more extensively altered in BAVc. The top pathway enriched for genes differentially expressed in calcified compared with normal valves was fibrosis, which support the remodeling process as a therapeutic target. These findings are relevant to understand the molecular basis of AS in patients with bicuspid and tricuspid valves.

  6. Comparative RNA sequencing reveals substantial genetic variation in endangered primates.

    PubMed

    Perry, George H; Melsted, Páll; Marioni, John C; Wang, Ying; Bainer, Russell; Pickrell, Joseph K; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D; Stephens, Matthew; Pritchard, Jonathan K; Gilad, Yoav

    2012-04-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.

  7. Comparative RNA sequencing reveals substantial genetic variation in endangered primates

    PubMed Central

    Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

    2012-01-01

    Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615

  8. Optimization of shRNA inhibitors by variation of the terminal loop sequence.

    PubMed

    Schopman, Nick C T; Liu, Ying Poi; Konstantinova, Pavlina; ter Brake, Olivier; Berkhout, Ben

    2010-05-01

    Gene silencing by RNA interference (RNAi) can be achieved by intracellular expression of a short hairpin RNA (shRNA) that is processed into the effective small interfering RNA (siRNA) inhibitor by the RNAi machinery. Previous studies indicate that shRNA molecules do not always reflect the activity of corresponding synthetic siRNAs that attack the same target sequence. One obvious difference between these two effector molecules is the hairpin loop of the shRNA. Most studies use the original shRNA design of the pSuper system, but no extensive study regarding optimization of the shRNA loop sequence has been performed. We tested the impact of different hairpin loop sequences, varying in size and structure, on the activity of a set of shRNAs targeting HIV-1. We were able to transform weak inhibitors into intermediate or even strong shRNA inhibitors by replacing the loop sequence. We demonstrate that the efficacy of these optimized shRNA inhibitors is improved significantly in different cell types due to increased siRNA production. These results indicate that the loop sequence is an essential part of the shRNA design. The optimized shRNA loop sequence is generally applicable for RNAi knockdown studies, and will allow us to develop a more potent gene therapy against HIV-1.

  9. Modulations of RNA sequences by cytokinin in pumpkin cotyledons

    SciTech Connect

    Chang, C.; Ertl, J.; Chen, C.

    1987-04-01

    Polyadenylated mRNAs from excised pumpkin cotyledons treated with or without 10/sup -4/ M benzyladenine (BA) for various time periods in suspension culture were assayed by in vitro translation in the presence of (/sup 35/S) methionine. The radioactive polypeptides were analyzed by one- and two-dimensional polyacrylamide gel electrophoresis. Specific sequences of mRNAs were enhanced, reduced, induced, or suppressed by the hormone within 60 min of the application of BA to the cotyledons. Four independent cDNA clones of cytokinin-modulated mRNAs have been selected and characterized. RNA blot hybridization using the four cDNA probes also indicates that the levels of specific mRNAs are modulated upward or downward by the hormone.

  10. Genome-wide analyses of Epstein-Barr virus reveal conserved RNA structures and a novel stable intronic sequence RNA

    PubMed Central

    2013-01-01

    Background Epstein-Barr virus (EBV) is a human herpesvirus implicated in cancer and autoimmune disorders. Little is known concerning the roles of RNA structure in this important human pathogen. This study provides the first comprehensive genome-wide survey of RNA and RNA structure in EBV. Results Novel EBV RNAs and RNA structures were identified by computational modeling and RNA-Seq analyses of EBV. Scans of the genomic sequences of four EBV strains (EBV-1, EBV-2, GD1, and GD2) and of the closely related Macacine herpesvirus 4 using the RNAz program discovered 265 regions with high probability of forming conserved RNA structures. Secondary structure models are proposed for these regions based on a combination of free energy minimization and comparative sequence analysis. The analysis of RNA-Seq data uncovered the first observation of a stable intronic sequence RNA (sisRNA) in EBV. The abundance of this sisRNA rivals that of the well-known and highly expressed EBV-encoded non-coding RNAs (EBERs). Conclusion This work identifies regions of the EBV genome likely to generate functional RNAs and RNA structures, provides structural models for these regions, and discusses potential functions suggested by the modeled structures. Enhanced understanding of the EBV transcriptome will guide future experimental analyses of the discovered RNAs and RNA structures. PMID:23937650

  11. Frequency distribution of pre-messenger RNA sequences in polyadenylated and non-polyadenylated nuclear RNA from Friend cells.

    PubMed Central

    Balmain, A; Minty, A J; Birnie, G D

    1980-01-01

    Hybridisation of cDNA probes for abundant and rare polysomal polyadenylated RNAs with polyadenylated and non-polyadenylated nuclear RNA from Friend cells indicated that the abundant polysomal polyadenylated RNA sequences were present at a higher concentration in the nucleus than rare polysomal sequences, but at a reduced range of concentrations. The ratio of the concentrations of abundant and rare sequences was about 3 in non-polyadenylated nuclear RNA, 9 in polyadenylated nuclear RNA and 13 in polysomal polyadenylated RNA. This suggests that polyadenylation may play a role in the quantitative selection of sequences for transport to the cytoplasm. Polyadenylation cannot be the only signal for transport, since a highly complex population of nucleus-confined polyadenylated molecules exists, each of which is present on average at less than one copy per cell. PMID:7433127

  12. A minimal ribosomal RNA: sequence and secondary structure of the 9S kinetoplast ribosomal RNA from Leishmania tarentolae.

    PubMed Central

    de la Cruz, V F; Lake, J A; Simpson, A M; Simpson, L

    1985-01-01

    The portion of the Leishmania tarentolae kinetoplast maxicircle DNA encoding the 9S RNA gene was sequenced, and the 5' and 3' ends of the transcript were determined. A secondary structure for the 9S RNA was determined based on the Escherichia coli 16S model. The 610-nucleotide 9S RNA exhibits a minimal secondary structure in which all four domains of the E. coli 16S structure are preserved. Within domains, however, some stems and loops have been greatly reduced or eliminated entirely. It is presumed that these reduced domains represent the minimal essential small ribosomal RNA secondary structures necessary for a functional ribosome. Alignment of the L. tarentolae 9S rRNA sequence with the published Trypanosoma brucei 9S rRNA sequence shows a nucleotide similarity of 84% and a transversion/transition ratio of 1.66. Images PMID:3856267

  13. Sequence of the 16S ribosomal RNA from Halobacterium volcanii, an archaebacterium

    NASA Technical Reports Server (NTRS)

    Gupta, R.; Lanter, J. M.; Woese, C. R.

    1983-01-01

    The sequence of the 16S ribosomal RNA (rRNA) from the archaebacterium Halobacterium volcanii has been determined by DNA sequencing methods. The archaebacterial rRNA is similar to its eubacterial counterpart in secondary structure. Although it is closer in sequence to the eubacterial 16S rRNA than to the eukaryotic 16S-like rRNA, the H. volcanii sequence also shows certain points of specific similarity to its eukaryotic counterpart. Since the H. volcanii sequence is closer to both the eubacterial and the eukaryotic sequences than these two are to one another, it follows that the archaebacterial sequence resembles their common ancestral sequence more closely than does either of the other two versions.

  14. Sequence of the 16S ribosomal RNA from Halobacterium volcanii, an archaebacterium

    NASA Technical Reports Server (NTRS)

    Gupta, R.; Lanter, J. M.; Woese, C. R.

    1983-01-01

    The sequence of the 16S ribosomal RNA (rRNA) from the archaebacterium Halobacterium volcanii has been determined by DNA sequencing methods. The archaebacterial rRNA is similar to its eubacterial counterpart in secondary structure. Although it is closer in sequence to the eubacterial 16S rRNA than to the eukaryotic 16S-like rRNA, the H. volcanii sequence also shows certain points of specific similarity to its eukaryotic counterpart. Since the H. volcanii sequence is closer to both the eubacterial and the eukaryotic sequences than these two are to one another, it follows that the archaebacterial sequence resembles their common ancestral sequence more closely than does either of the other two versions.

  15. Sequence of the 16S Ribosomal RNA from Halobacterium volcanii, an Archaebacterium.

    PubMed

    Gupta, R; Lanter, J M; Woese, C R

    1983-08-12

    The sequence of the 16S ribosomal RNA (rRNA) from the archaebacterium Halobacterium volcanii has been determined by DNA sequencing methods. The archaebacterial rRNA is similar to its eubacterial counterpart in secondary structure. Although it is closer in sequence to the eubacterial 16S rRNA than to the eukaryotic 16S-like rRNA, the H. volcanii sequence also shows certain points of specific similarity to its eukaryotic counterpart. Since the H. volcanii sequence is closer to both the eubacterial and the eukaryotic sequences than these two are to one another, it follows that the archaebacterial sequence resembles their common ancestral sequence more closely than does either of the other two versions.

  16. Tracking Cryptosporidium parvum by sequence analysis of small double-stranded RNA.

    PubMed Central

    Xiao, L.; Limor, J.; Bern, C.; Lal, A. A.

    2001-01-01

    We sequenced a 173-nucleotide fragment of the small double-stranded viruslike RNA of Cryptosporidium parvum isolates from 23 calves and 38 humans. Sequence diversity was detected at 17 sites. Isolates from the same outbreak had identical double-stranded RNA sequences, suggesting that this technique may be useful for tracking Cryptosporidium infection sources. PMID:11266306

  17. High-throughput illumina strand-specific RNA sequencing library preparation

    USDA-ARS?s Scientific Manuscript database

    Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...

  18. FASTR: A novel data format for concomitant representation of RNA sequence and secondary structure information.

    PubMed

    Bose, Tungadri; Dutta, Anirban; Mh, Mohammed; Gandhi, Hemang; Mande, Sharmila S

    2015-09-01

    Given the importance of RNA secondary structures in defining their biological role, it would be convenient for researchers seeking RNA data if both sequence and structural information pertaining to RNA molecules are made available together. Current nucleotide data repositories archive only RNA sequence data. Furthermore, storage formats which can frugally represent RNA sequence as well as structure data in a single file, are currently unavailable. This article proposes a novel storage format, 'FASTR', for concomitant representation of RNA sequence and structure. The storage efficiency of the proposed FASTR format has been evaluated using RNA data from various microorganisms. Results indicate that the size of FASTR formatted files (containing both RNA sequence as well as structure information) are equivalent to that of FASTA-format files, which contain only RNA sequence information. RNA secondary structure is typically represented using a combination of a string of nucleotide characters along with the corresponding dot-bracket notation indicating structural attributes. 'FASTR' - the novel storage format proposed in the present study enables a frugal representation of both RNA sequence and structural information in the form of a single string. In spite of having a relatively smaller storage footprint, the resultant 'fastr' string(s) retain all sequence as well as secondary structural information that could be stored using a dot-bracket notation. An implementation of the 'FASTR' methodology is available for download at http://metagenomics.atc.tcs.com/compression/fastr.

  19. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  20. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.

  1. Determination of sequence and structural requirements for pathogenicity of a cucumber mosaic virus satellite RNA (Y-satRNA).

    PubMed

    Masuta, C; Takanami, Y

    1989-12-01

    We describe the use of biologically active cDNA clones to investigate genetic determinants of a satellite RNA that modulates symptoms normally induced by its helper virus, cucumber mosaic virus (CMV). For this purpose, we have investigated a CMV satellite RNA (Y-satRNA) that induces bright yellow symptoms on tobacco and necrosis on tomato. To determine the pathogenicity-modulating domain of Y-satRNA, several insertion and deletion mutants were created by using various restriction sites in the cDNA of Y-satRNA, and RNA transcripts derived from the clones were mixed with CMV and used to inoculate plants. Although the satellite RNA was able to tolerate small insertions (as much as 4 bases at present), small deletions were deleterious, indicating that the sequence requirements for viability of the satellite RNA are relatively inflexible. Biological activity assays of chimeric satellite RNAs between Y-satRNA and a non-necrogenic satellite RNA, T73-satRNA, suggested that only two (or at least one of two) specific bases (positions 318 and 325) in the 3' region direct the necrogenic property of Y-satRNA. Sequences involved in production of yellow symptoms were investigated by constructing chimeras between Y-sat cDNA and cDNA of a satellite RNA designated S19-satRNA. S19-satRNA has considerable homology to Y-satRNA but does not elicit yellow symptoms on tobacco. Chimeric clones were constructed by using a BstXI site that cuts within a stable secondary structure in the region between positions 100 and 200 (region Y). The results of infectivity tests with RNA transcripts suggest that formation of a secondary structure in region Y may be involved in induction of yellow symptoms as well as viability of Y-satRNA.

  2. Pea enation mosaic virus genoma RNA contains no polyadenylate sequences and cannot be aminoacylated.

    PubMed

    German, T L; De Zoeten, G A; Hall, T C

    1978-01-01

    An active synthetase enzyme preparation from peas (Pisum sativum L.) did not catalyze the aminoacylation of pea enation mosaic virus RNA. The viral RNA was shown not to contain polyadenylic acid sequences.

  3. Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations

    PubMed Central

    Martin, Dorrelyn P.; Miya, Jharna; Reeser, Julie W.; Roychowdhury, Sameek

    2017-01-01

    RNA sequencing (RNAseq) is a versatile method that can be utilized to detect and characterize gene expression, mutations, gene fusions, and noncoding RNAs. Standard RNAseq requires 30 – 100 million sequencing reads and can include multiple RNA products such as mRNA and noncoding RNAs. We demonstrate how targeted RNAseq (capture) permits a focused study on selected RNA products using a desktop sequencer. RNAseq capture can characterize unannotated, low, or transiently expressed transcripts that may otherwise be missed using traditional RNAseq methods. Here we describe the extraction of RNA from cell lines, ribosomal RNA depletion, cDNA synthesis, preparation of barcoded libraries, hybridization and capture of targeted transcripts and multiplex sequencing on a desktop sequencer. We also outline the computational analysis pipeline, which includes quality control assessment, alignment, fusion detection, gene expression quantification and identification of single nucleotide variants. This assay allows for targeted transcript sequencing to characterize gene expression, gene fusions, and mutations. PMID:27585245

  4. Presence of tadpole and adult globin RNA sequences in oocytes of Xenopus laevis

    PubMed Central

    Perlman, S. M.; Ford, P. J.; Rosbash, M. M.

    1977-01-01

    Complementary DNA transcribed from adult Xenopus laevis globin mRNA was used to assay ovary RNA from Xenopus for the presence of globin sequences by RNA·cDNA hybridization. These sequences are present at approximately the same concentration as the majority of poly(A)-containing ovary sequences. The sequences are also found at approximately 200,000 copies per cell in poly(A)-containing RNA extracted from mature oocytes. To rule out contamination of the oocytes with somatic cells, two additional experiments were performed. First, RNA isolated from ovulated unfertilized eggs, which are devoid of somatic cells, was also shown to contain the globin sequences. Second, globin mRNA was isolated from Xenopus tadpoles. Adult globin mRNA is free of the tadpole sequence and no homology was detected between adult and tadpoles globin RNA. The ovary was shown to contain tadpole globin RNA at nearly the same concentration as the adult sequences. Thus, the results cannot be explained by contamination with erythroid cells which should contain only the adult sequence. The swimming tadpole, which possesses an active circulatory system, was also assayed for the tadpole and adult globin sequences. Whereas the adult sequences are present at approximately the same concentration as in the mature oocyte, the concentration of the tadpole sequences increases at least 300-fold in the first 3 days following fertilization. PMID:269434

  5. RNA Sequencing and Co-expressed Long Non-coding RNA in Modern and Wild Wheats.

    PubMed

    Cagirici, Halise Busra; Alptekin, Burcu; Budak, Hikmet

    2017-09-06

    There is an urgent need for the improvement of drought-tolerant bread and durum wheat. The huge and complex genome of bread wheat (BBAADD genome) stands as a vital obstruction for understanding the molecular mechanism underlying drought tolerance. However, tetraploid wheat (Triticum turgidum ssp., BBAA genome) is an ancestor of modern bread wheat and offers an important model for studying the drought response due to its less complex genome. Additionally, several wild relatives of tetraploid wheat have already shown a significant drought tolerance. We sequenced root transcriptome of three tetraploid wheat varieties with varying stress tolerance profiles, and built differential expression library of their transcripts under control and drought conditions. More than 5,000 differentially expressed transcripts were identified from each genotype. Functional characterization of transcripts specific to drought-tolerant genotype, revealed their association with osmolytes production and secondary metabolite pathways. Comparative analysis of differentially expressed genes and their non-coding RNA partners, long noncoding RNAs and microRNAs, provided valuable insight to gene expression regulation in response to drought stress. LncRNAs as well as coding transcripts share similar structural features in different tetraploid species; yet, lncRNAs slightly differ from coding transcripts. Several miRNA-lncRNA target pairs were detected as differentially expressed in drought stress. Overall, this study suggested an important pool of transcripts where their manipulations confer a better performance of wheat varieties under drought stress.

  6. Mutations in the yeast RNA14 and RNA15 genes result in an abnormal mRNA decay rate; sequence analysis reveals an RNA-binding domain in the RNA15 protein.

    PubMed Central

    Minvielle-Sebastia, L; Winsor, B; Bonneaud, N; Lacroute, F

    1991-01-01

    In Saccharomyces cerevisiae, temperature-sensitive mutations in the genes RNA14 and RNA15 correlate with a reduction of mRNA stability and poly(A) tail length. Although mRNA transcription is not abolished in these mutants, the transcripts are rapidly deadenylated as in a strain carrying an RNA polymerase B(II) temperature-sensitive mutation. This suggests that the primary defect could be in the control of the poly(A) status of the mRNAs and that the fast decay rate may be due to the loss of this control. By complementation of their temperature-sensitive phenotype, we have cloned the wild-type genes. They are essential for cell viability and are unique in the haploid genome. The RNA14 gene, located on chromosome H, is transcribed as three mRNAs, one major and two minor, which are 2.2, 1.5, and 1.1 kb in length. The RNA15 gene gives rise to a single 1.2-kb transcript and maps to chromosome XVI. Sequence analysis indicates that RNA14 encodes a 636-amino-acid protein with a calculated molecular weight of 75,295. No homology was found between RNA14 and RNA15 or between RNA14 and other proteins contained in data banks. The RNA15 DNA sequence predicts a protein of 296 amino acids with a molecular weight of 32,770. Sequence comparison reveals an N-terminal putative RNA-binding domain in the RNA15-encoded protein, followed by a glutamine and asparagine stretch similar to the opa sequences. Both RNA14 and RNA15 wild-type genes, when cloned on a multicopy plasmid, are able to suppress the temperature-sensitive phenotype of strains bearing either the rna14 or the rna15 mutation, suggesting that the encoded proteins could interact with each other. Images PMID:1674817

  7. RNA reprogramming of alpha-mannosidase mRNA sequences in vitro by myxomycete group IC1 and IE ribozymes.

    PubMed

    Fiskaa, Tonje; Lundblad, Eirik W; Henriksen, Jørn R; Johansen, Steinar D; Einvik, Christer

    2006-06-01

    Trans-splicing group I ribozymes have been introduced in order to mediate RNA reprogramming (including RNA repair) of therapeutically relevant RNA transcripts. Efficient RNA reprogramming depends on the appropriate efficiency of the reaction, and several attempts, including optimization of target recognition and ribozyme catalysis, have been performed. In most studies, the Tetrahymena group IC1 ribozyme has been applied. Here we investigate the potential of group IC1 and group IE intron ribozymes, derived from the myxomycetes Didymium and Fuligo, in addition to the Tetrahymena ribozyme, for RNA reprogramming of a mutated alpha-mannosidase mRNA sequence. Randomized internal guide sequences were introduced for all four ribozymes and used to select accessible sites within isolated mutant alpha-mannosidase mRNA from mammalian COS-7 cells. Two accessible sites common to all the group I ribozymes were identified and further investigated in RNA reprogramming by trans-splicing analyses. All the myxomycete ribozymes performed the trans-splicing reaction with high fidelity, resulting in the conversion of mutated alpha-mannosidase RNA into wild-type sequence. RNA protection analysis revealed that the myxomycete ribozymes perform trans-splicing at approximately similar efficiencies as the Tetrahymena ribozyme. Interestingly, the relative efficiency among the ribozymes tested correlates with structural features of the P4-P6-folding domain, consistent with the fact that efficient folding is essential for group I intron trans-splicing.

  8. Depletion of tRNA-halves enables effective small RNA sequencing of low-input murine serum samples

    PubMed Central

    Van Goethem, Alan; Yigit, Nurten; Everaert, Celine; Moreno-Smith, Myrthala; Mus, Liselot M.; Barbieri, Eveline; Speleman, Frank; Mestdagh, Pieter; Shohet, Jason; Van Maerken, Tom; Vandesompele, Jo

    2016-01-01

    The ongoing ascent of sequencing technologies has enabled researchers to gain unprecedented insights into the RNA content of biological samples. MiRNAs, a class of small non-coding RNAs, play a pivotal role in regulating gene expression. The discovery that miRNAs are stably present in circulation has spiked interest in their potential use as minimally-invasive biomarkers. However, sequencing of blood-derived samples (serum, plasma) is challenging due to the often low RNA concentration, poor RNA quality and the presence of highly abundant RNAs that dominate sequencing libraries. In murine serum for example, the high abundance of tRNA-derived small RNAs called 5′ tRNA halves hampers the detection of other small RNAs, like miRNAs. We therefore evaluated two complementary approaches for targeted depletion of 5′ tRNA halves in murine serum samples. Using a protocol based on biotinylated DNA probes and streptavidin coated magnetic beads we were able to selectively deplete 95% of the targeted 5′ tRNA half molecules. This allowed an unbiased enrichment of the miRNA fraction resulting in a 6-fold increase of mapped miRNA reads and 60% more unique miRNAs detected. Moreover, when comparing miRNA levels in tumor-carrying versus tumor-free mice, we observed a three-fold increase in differentially expressed miRNAs. PMID:27901112

  9. A Modified RNA-Seq Approach for Whole Genome Sequencing of RNA Viruses from Faecal and Blood Samples

    PubMed Central

    Argoud, Karène; Attar, Moustafa; Buck, David; Ip, Camilla L. C.; Golubchik, Tanya; Cule, Madeleine; Bowden, Rory; Manganis, Charis; Klenerman, Paul; Barnes, Eleanor; Walker, A. Sarah; Wyllie, David H.; Wilson, Daniel J.; Dingle, Kate E.; Peto, Tim E. A.

    2013-01-01

    To date, very large scale sequencing of many clinically important RNA viruses has been complicated by their high population molecular variation, which creates challenges for polymerase chain reaction and sequencing primer design. Many RNA viruses are also difficult or currently not possible to culture, severely limiting the amount and purity of available starting material. Here, we describe a simple, novel, high-throughput approach to Norovirus and Hepatitis C virus whole genome sequence determination based on RNA shotgun sequencing (also known as RNA-Seq). We demonstrate the effectiveness of this method by sequencing three Norovirus samples from faeces and two Hepatitis C virus samples from blood, on an Illumina MiSeq benchtop sequencer. More than 97% of reference genomes were recovered. Compared with Sanger sequencing, our method had no nucleotide differences in 14,019 nucleotides (nt) for Noroviruses (from a total of 2 Norovirus genomes obtained with Sanger sequencing), and 8 variants in 9,542 nt for Hepatitis C virus (1 variant per 1,193 nt). The three Norovirus samples had 2, 3, and 2 distinct positions called as heterozygous, while the two Hepatitis C virus samples had 117 and 131 positions called as heterozygous. To confirm that our sample and library preparation could be scaled to true high-throughput, we prepared and sequenced an additional 77 Norovirus samples in a single batch on an Illumina HiSeq 2000 sequencer, recovering >90% of the reference genome in all but one sample. No discrepancies were observed across 118,757 nt compared between Sanger and our custom RNA-Seq method in 16 samples. By generating viral genomic sequences that are not biased by primer-specific amplification or enrichment, this method offers the prospect of large-scale, affordable studies of RNA viruses which could be adapted to routine diagnostic laboratory workflows in the near future, with the potential to directly characterize within-host viral diversity. PMID:23762474

  10. Rational experiment design for sequencing-based RNA structure mapping.

    PubMed

    Aviran, Sharon; Pachter, Lior

    2014-12-01

    Structure mapping is a classic experimental approach for determining nucleic acid structure that has gained renewed interest in recent years following advances in chemistry, genomics, and informatics. The approach encompasses numerous techniques that use different means to introduce nucleotide-level modifications in a structure-dependent manner. Modifications are assayed via cDNA fragment analysis, using electrophoresis or next-generation sequencing (NGS). The recent advent of NGS has dramatically increased the throughput, multiplexing capacity, and scope of RNA structure mapping assays, thereby opening new possibilities for genome-scale, de novo, and in vivo studies. From an informatics standpoint, NGS is more informative than prior technologies by virtue of delivering direct molecular measurements in the form of digital sequence counts. Motivated by these new capabilities, we introduce a novel model-based in silico approach for quantitative design of large-scale multiplexed NGS structure mapping assays, which takes advantage of the direct and digital nature of NGS readouts. We use it to characterize the relationship between controllable experimental parameters and the precision of mapping measurements. Our results highlight the complexity of these dependencies and shed light on relevant tradeoffs and pitfalls, which can be difficult to discern by intuition alone. We demonstrate our approach by quantitatively assessing the robustness of SHAPE-Seq measurements, obtained by multiplexing SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension) chemistry in conjunction with NGS. We then utilize it to elucidate design considerations in advanced genome-wide approaches for probing the transcriptome, which recently obtained in vivo information using dimethyl sulfate (DMS) chemistry.

  11. RNA Silencing Induced by an Artificial Sequence That Prevents Proper Transcription Termination in Rice1[W

    PubMed Central

    Kawakatsu, Taiji; Wakasa, Yuhya; Yasuda, Hiroshi; Takaiwa, Fumio

    2012-01-01

    Posttranscriptional gene silencing (PTGS) is a sequence-specific mRNA degradation caused by small RNA, such as microRNA (miRNA) and small interfering RNA (siRNA). miRNAs are generated from MIRNA loci, whereas siRNAs originate from various sources of double-stranded RNA. In this study, an artificial RNA silencing inducible sequence (RSIS) was identified in rice (Oryza sativa). This sequence causes PTGS of 5′ or 3′ flanking-sequence-containing genes. Interestingly, two target genes can be simultaneously suppressed by linking a unique target sequence to either the 5′ or 3′ end of RSIS. Multiple gene suppression can be also achieved though a single transformation event by incorporating the multisite gateway system. Moreover, RSIS-mediated PTGS occurs in nuclei. Deep sequencing of small RNAs reveals that siRNAs are produced from RSIS-expressing cassettes and transitive siRNAs are produced from endogenous target genes. Furthermore, siRNAs are typically generated from untranscribed transgene terminator regions. The read-through transcripts from the RSIS-expression cassette were consistently observed, and most of these sequences were not polyadenylated. Collectively, this data indicates that RSIS inhibits proper transcription termination. The resulting transcripts are not polyadenylated. These transcripts containing RSIS become templates for double-stranded RNA synthesis in nuclei. This is followed by siRNA production and target degradation of target genes. PMID:22843666

  12. RNA silencing induced by an artificial sequence that prevents proper transcription termination in rice.

    PubMed

    Kawakatsu, Taiji; Wakasa, Yuhya; Yasuda, Hiroshi; Takaiwa, Fumio

    2012-10-01

    Posttranscriptional gene silencing (PTGS) is a sequence-specific mRNA degradation caused by small RNA, such as microRNA (miRNA) and small interfering RNA (siRNA). miRNAs are generated from MIRNA loci, whereas siRNAs originate from various sources of double-stranded RNA. In this study, an artificial RNA silencing inducible sequence (RSIS) was identified in rice (Oryza sativa). This sequence causes PTGS of 5' or 3' flanking-sequence-containing genes. Interestingly, two target genes can be simultaneously suppressed by linking a unique target sequence to either the 5' or 3' end of RSIS. Multiple gene suppression can be also achieved though a single transformation event by incorporating the multisite gateway system. Moreover, RSIS-mediated PTGS occurs in nuclei. Deep sequencing of small RNAs reveals that siRNAs are produced from RSIS-expressing cassettes and transitive siRNAs are produced from endogenous target genes. Furthermore, siRNAs are typically generated from untranscribed transgene terminator regions. The read-through transcripts from the RSIS-expression cassette were consistently observed, and most of these sequences were not polyadenylated. Collectively, this data indicates that RSIS inhibits proper transcription termination. The resulting transcripts are not polyadenylated. These transcripts containing RSIS become templates for double-stranded RNA synthesis in nuclei. This is followed by siRNA production and target degradation of target genes.

  13. Equally parsimonious pathways through an RNA sequence space are not equally likely

    NASA Technical Reports Server (NTRS)

    Lee, Y. H.; DSouza, L. M.; Fox, G. E.

    1997-01-01

    An experimental system for determining the potential ability of sequences resembling 5S ribosomal RNA (rRNA) to perform as functional 5S rRNAs in vivo in the Escherichia coli cellular environment was devised previously. Presumably, the only 5S rRNA sequences that would have been fixed by ancestral populations are ones that were functionally valid, and hence the actual historical paths taken through RNA sequence space during 5S rRNA evolution would have most likely utilized valid sequences. Herein, we examine the potential validity of all sequence intermediates along alternative equally parsimonious trajectories through RNA sequence space which connect two pairs of sequences that had previously been shown to behave as valid 5S rRNAs in E. coli. The first trajectory requires a total of four changes. The 14 sequence intermediates provide 24 apparently equally parsimonious paths by which the transition could occur. The second trajectory involves three changes, six intermediate sequences, and six potentially equally parsimonious paths. In total, only eight of the 20 sequence intermediates were found to be clearly invalid. As a consequence of the position of these invalid intermediates in the sequence space, seven of the 30 possible paths consisted of exclusively valid sequences. In several cases, the apparent validity/invalidity of the intermediate sequences could not be anticipated on the basis of current knowledge of the 5S rRNA structure. This suggests that the interdependencies in RNA sequence space may be more complex than currently appreciated. If ancestral sequences predicted by parsimony are to be regarded as actual historical sequences, then the present results would suggest that they should also satisfy a validity requirement and that, in at least limited cases, this conjecture can be tested experimentally.

  14. Equally parsimonious pathways through an RNA sequence space are not equally likely

    NASA Technical Reports Server (NTRS)

    Lee, Y. H.; DSouza, L. M.; Fox, G. E.

    1997-01-01

    An experimental system for determining the potential ability of sequences resembling 5S ribosomal RNA (rRNA) to perform as functional 5S rRNAs in vivo in the Escherichia coli cellular environment was devised previously. Presumably, the only 5S rRNA sequences that would have been fixed by ancestral populations are ones that were functionally valid, and hence the actual historical paths taken through RNA sequence space during 5S rRNA evolution would have most likely utilized valid sequences. Herein, we examine the potential validity of all sequence intermediates along alternative equally parsimonious trajectories through RNA sequence space which connect two pairs of sequences that had previously been shown to behave as valid 5S rRNAs in E. coli. The first trajectory requires a total of four changes. The 14 sequence intermediates provide 24 apparently equally parsimonious paths by which the transition could occur. The second trajectory involves three changes, six intermediate sequences, and six potentially equally parsimonious paths. In total, only eight of the 20 sequence intermediates were found to be clearly invalid. As a consequence of the position of these invalid intermediates in the sequence space, seven of the 30 possible paths consisted of exclusively valid sequences. In several cases, the apparent validity/invalidity of the intermediate sequences could not be anticipated on the basis of current knowledge of the 5S rRNA structure. This suggests that the interdependencies in RNA sequence space may be more complex than currently appreciated. If ancestral sequences predicted by parsimony are to be regarded as actual historical sequences, then the present results would suggest that they should also satisfy a validity requirement and that, in at least limited cases, this conjecture can be tested experimentally.

  15. Escherichia coli 16S rRNA 3'-end formation requires a distal transfer RNA sequence at a proper distance.

    PubMed Central

    Srivastava, A K; Schlessinger, D

    1989-01-01

    The 16S rRNA species in bacterial precursor rRNAs is followed by two evolutionarily conserved features: (i) a double-stranded stem formed by complementary sequences adjacent to the 5' and 3' ends of the 16S rRNA; and (ii) a 3'-transfer RNA sequence. To assess the possible role of these features, plasmid constructs with precursor-specific features deleted were tested for their capacity to form mature rRNA. Stem-forming sequences were dispensable for both 5' and 3' terminus formation; whereas an intact spacer tRNA positioned greater than 24 nucleotides downstream of the 16S RNA sequence was required for correct 3'-end maturation. These results suggest that spacer tRNA at an appropriate location helps form a conformation obligate for pre-rRNA processing, perhaps by binding to a nascent binding site in preribosomes. Thus, spacer tRNAs may be an obligate participant in ribosome formation. Images PMID:2684637

  16. Identification and characterization of microRNA sequences from bovine mammary epithelial cells.

    PubMed

    Bu, D P; Nan, X M; Wang, F; Loor, J J; Wang, J Q

    2015-03-01

    The bovine mammary gland is composed of various cell types including bovine mammary epithelial cells (BMEC). The use of BMEC to uncover the microRNA (miRNA) profile would allow us to obtain a more specific profile of miRNA sequences that could be associated with lactation and avoid interference from other cell types. The objective of this study was to characterize the miRNA sequences expressed in isolated BMEC. The miRNA were identified by Solexa sequencing technology (Illumina Inc., San Diego, CA). Furthermore, novel miRNA were uncovered by stem-loop reverse transcription-PCR and sequencing of PCR products. To detect tissue specificity, expression of novel miRNA sequences was measured by stem-loop RT-PCR and sequencing of PCR products in mammary gland, liver, adipose, ileum, spleen and kidney tissue from 3 lactating Holstein cows (50±10 d postpartum). After bioinformatics analysis, 12,323,451 reads were obtained by Solexa sequencing, of which 11,979,706 were clean reads, matching the bovine genome. Among clean reads, 9,428,122 belonged to miRNA sequences. Further analysis revealed that the miRNA bta-mir-184 had the most abundant expression, and 388 loci possessed the typical stem-loop structures matching known miRNA hairpins. In total, 38 loci with novel hairpins were identified as novel miRNA and were numbered from bta-U1 to bta-U38. One novel miRNA (bta-U21) was specific to mammary gland. Seven novel miRNA, including bta-U21, had tissue-restricted distribution. Uncovering the specific roles of these novel miRNA during lactation appears warranted.

  17. Maternal Plasma DNA and RNA Sequencing for Prenatal Testing.

    PubMed

    Tamminga, Saskia; van Maarle, Merel; Henneman, Lidewij; Oudejans, Cees B M; Cornel, Martina C; Sistermans, Erik A

    2016-01-01

    Cell-free DNA (cfDNA) testing has recently become indispensable in diagnostic testing and screening. In the prenatal setting, this type of testing is often called noninvasive prenatal testing (NIPT). With a number of techniques, using either next-generation sequencing or single nucleotide polymorphism-based approaches, fetal cfDNA in maternal plasma can be analyzed to screen for rhesus D genotype, common chromosomal aneuploidies, and increasingly for testing other conditions, including monogenic disorders. With regard to screening for common aneuploidies, challenges arise when implementing NIPT in current prenatal settings. Depending on the method used (targeted or nontargeted), chromosomal anomalies other than trisomy 21, 18, or 13 can be detected, either of fetal or maternal origin, also referred to as unsolicited or incidental findings. For various biological reasons, there is a small chance of having either a false-positive or false-negative NIPT result, or no result, also referred to as a "no-call." Both pre- and posttest counseling for NIPT should include discussing potential discrepancies. Since NIPT remains a screening test, a positive NIPT result should be confirmed by invasive diagnostic testing (either by chorionic villus biopsy or by amniocentesis). As the scope of NIPT is widening, professional guidelines need to discuss the ethics of what to offer and how to offer. In this review, we discuss the current biochemical, clinical, and ethical challenges of cfDNA testing in the prenatal setting and its future perspectives including novel applications that target RNA instead of DNA.

  18. Single-Cell RNA Sequencing of Human T Cells.

    PubMed

    Villani, Alexandra-Chloé; Shekhar, Karthik

    2017-01-01

    Understanding how populations of human T cells leverage cellular heterogeneity, plasticity, and diversity to achieve a wide range of functional flexibility, particularly during dynamic processes such as development, differentiation, and antigenic response, is a core challenge that is well suited for single-cell analysis. Hypothesis-free evaluation of cellular states and subpopulations by transcriptional profiling of single T cells can identify relationships that may be obscured by targeted approaches such as FACS sorting on cell-surface antigens, or bulk expression analysis. While this approach is relevant to all cell types, it is of particular interest in the study of T cells for which classical phenotypic criteria are now viewed as insufficient for distinguishing different T cell subtypes and transitional states, and defining the changes associated with dysfunctional T cell states in autoimmunity and tumor-related exhaustion. This unit describes a protocol to generate single-cell transcriptomic libraries of human blood CD4(+) and CD8(+) T cells, and also introduces the basic bioinformatic steps to process the resulting sequence data for further computational analysis. We show how cellular subpopulations can be identified from transcriptional data, and derive characteristic gene expression signatures that distinguish these states. We believe single-cell RNA-seq is a powerful technique to study the cellular heterogeneity in complex tissues, a paradigm that will be of great value for the immune system.

  19. Annealing to sequences within the primer binding site loop promotes an HIV-1 RNA conformation favoring RNA dimerization and packaging

    PubMed Central

    Seif, Elias; Niu, Meijuan; Kleiman, Lawrence

    2013-01-01

    The 5′ untranslated region (5′ UTR) of HIV-1 genomic RNA (gRNA) includes structural elements that regulate reverse transcription, transcription, translation, tRNALys3 annealing to the gRNA, and gRNA dimerization and packaging into viruses. It has been reported that gRNA dimerization and packaging are regulated by changes in the conformation of the 5′-UTR RNA. In this study, we show that annealing of tRNALys3 or a DNA oligomer complementary to sequences within the primer binding site (PBS) loop of the 5′ UTR enhances its dimerization in vitro. Structural analysis of the 5′-UTR RNA using selective 2′-hydroxyl acylation analyzed by primer extension (SHAPE) shows that the annealing promotes a conformational change of the 5′ UTR that has been previously reported to favor gRNA dimerization and packaging into virus. The model predicted by SHAPE analysis is supported by antisense experiments designed to test which annealed sequences will promote or inhibit gRNA dimerization. Based on reports showing that the gRNA dimerization favors its incorporation into viruses, we tested the ability of a mutant gRNA unable to anneal to tRNALys3 to be incorporated into virions. We found a ∼60% decrease in mutant gRNA packaging compared with wild-type gRNA. Together, these data further support a model for viral assembly in which the initial annealing of tRNALys3 to gRNA is cytoplasmic, which in turn aids in the promotion of gRNA dimerization and its incorporation into virions. PMID:23960173

  20. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing.

    PubMed

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0-9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community.

  1. Evaluating Methods for Isolating Total RNA and Predicting the Success of Sequencing Phylogenetically Diverse Plant Transcriptomes

    PubMed Central

    Bruskiewich, Richard; Burris, Jason N.; Carrigan, Charlotte T.; Chase, Mark W.; Clarke, Neil D.; Covshoff, Sarah; dePamphilis, Claude W.; Edger, Patrick P.; Goh, Falicia; Graham, Sean; Greiner, Stephan; Hibberd, Julian M.; Jordon-Thaden, Ingrid; Kutchan, Toni M.; Leebens-Mack, James; Melkonian, Michael; Miles, Nicholas; Myburg, Henrietta; Patterson, Jordan; Pires, J. Chris; Ralph, Paula; Rolf, Megan; Sage, Rowan F.; Soltis, Douglas; Soltis, Pamela; Stevenson, Dennis; Stewart, C. Neal; Surek, Barbara; Thomsen, Christina J. M.; Villarreal, Juan Carlos; Wu, Xiaolei; Zhang, Yong; Deyholos, Michael K.; Wong, Gane Ka-Shu

    2012-01-01

    Next-generation sequencing plays a central role in the characterization and quantification of transcriptomes. Although numerous metrics are purported to quantify the quality of RNA, there have been no large-scale empirical evaluations of the major determinants of sequencing success. We used a combination of existing and newly developed methods to isolate total RNA from 1115 samples from 695 plant species in 324 families, which represents >900 million years of phylogenetic diversity from green algae through flowering plants, including many plants of economic importance. We then sequenced 629 of these samples on Illumina GAIIx and HiSeq platforms and performed a large comparative analysis to identify predictors of RNA quality and the diversity of putative genes (scaffolds) expressed within samples. Tissue types (e.g., leaf vs. flower) varied in RNA quality, sequencing depth and the number of scaffolds. Tissue age also influenced RNA quality but not the number of scaffolds ≥1000 bp. Overall, 36% of the variation in the number of scaffolds was explained by metrics of RNA integrity (RIN score), RNA purity (OD 260/230), sequencing platform (GAIIx vs HiSeq) and the amount of total RNA used for sequencing. However, our results show that the most commonly used measures of RNA quality (e.g., RIN) are weak predictors of the number of scaffolds because Illumina sequencing is robust to variation in RNA quality. These results provide novel insight into the methods that are most important in isolating high quality RNA for sequencing and assembling plant transcriptomes. The methods and recommendations provided here could increase the efficiency and decrease the cost of RNA sequencing for individual labs and genome centers. PMID:23185583

  2. Characterising the Canine Oral Microbiome by Direct Sequencing of Reverse-Transcribed rRNA Molecules.

    PubMed

    McDonald, James E; Larsen, Niels; Pennington, Andrea; Connolly, John; Wallis, Corrin; Rooks, David J; Hall, Neil; McCarthy, Alan J; Allison, Heather E

    2016-01-01

    PCR amplification and sequencing of phylogenetic markers, primarily Small Sub-Unit ribosomal RNA (SSU rRNA) genes, has been the paradigm for defining the taxonomic composition of microbiomes. However, 'universal' SSU rRNA gene PCR primer sets are likely to miss much of the diversity therein. We sequenced a library comprising purified and reverse-transcribed SSU rRNA (RT-SSU rRNA) molecules from the canine oral microbiome and compared it to a general bacterial 16S rRNA gene PCR amplicon library generated from the same biological sample. In addition, we have developed BIONmeta, a novel, open-source, computer package for the processing and taxonomic classification of the randomly fragmented RT-SSU rRNA reads produced. Direct RT-SSU rRNA sequencing revealed that 16S rRNA molecules belonging to the bacterial phyla Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria and Spirochaetes, were most abundant in the canine oral microbiome (92.5% of total bacterial SSU rRNA). The direct rRNA sequencing approach detected greater taxonomic diversity (1 additional phylum, 2 classes, 1 order, 10 families and 61 genera) when compared with general bacterial 16S rRNA amplicons from the same sample, simultaneously provided SSU rRNA gene inventories of Bacteria, Archaea and Eukarya, and detected significant numbers of sequences not recognised by 'universal' primer sets. Proteobacteria and Spirochaetes were found to be under-represented by PCR-based analysis of the microbiome, and this was due to primer mismatches and taxon-specific variations in amplification efficiency, validated by qPCR analysis of 16S rRNA amplicons from a mock community. This demonstrated the veracity of direct RT-SSU rRNA sequencing for molecular microbial ecology.

  3. Characterising the Canine Oral Microbiome by Direct Sequencing of Reverse-Transcribed rRNA Molecules

    PubMed Central

    McDonald, James E.; Larsen, Niels; Pennington, Andrea; Connolly, John; Wallis, Corrin; Rooks, David J.; Hall, Neil; McCarthy, Alan J.; Allison, Heather E.

    2016-01-01

    PCR amplification and sequencing of phylogenetic markers, primarily Small Sub-Unit ribosomal RNA (SSU rRNA) genes, has been the paradigm for defining the taxonomic composition of microbiomes. However, ‘universal’ SSU rRNA gene PCR primer sets are likely to miss much of the diversity therein. We sequenced a library comprising purified and reverse-transcribed SSU rRNA (RT-SSU rRNA) molecules from the canine oral microbiome and compared it to a general bacterial 16S rRNA gene PCR amplicon library generated from the same biological sample. In addition, we have developed BIONmeta, a novel, open-source, computer package for the processing and taxonomic classification of the randomly fragmented RT-SSU rRNA reads produced. Direct RT-SSU rRNA sequencing revealed that 16S rRNA molecules belonging to the bacterial phyla Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria and Spirochaetes, were most abundant in the canine oral microbiome (92.5% of total bacterial SSU rRNA). The direct rRNA sequencing approach detected greater taxonomic diversity (1 additional phylum, 2 classes, 1 order, 10 families and 61 genera) when compared with general bacterial 16S rRNA amplicons from the same sample, simultaneously provided SSU rRNA gene inventories of Bacteria, Archaea and Eukarya, and detected significant numbers of sequences not recognised by ‘universal’ primer sets. Proteobacteria and Spirochaetes were found to be under-represented by PCR-based analysis of the microbiome, and this was due to primer mismatches and taxon-specific variations in amplification efficiency, validated by qPCR analysis of 16S rRNA amplicons from a mock community. This demonstrated the veracity of direct RT-SSU rRNA sequencing for molecular microbial ecology. PMID:27276347

  4. Distinct tmRNA sequence elements facilitate RNase R engagement on rescued ribosomes for selective nonstop mRNA decay

    PubMed Central

    Venkataraman, Krithika; Zafar, Hina; Karzai, A. Wali

    2014-01-01

    trans-Translation, orchestrated by SmpB and tmRNA, is the principal eubacterial pathway for resolving stalled translation complexes. RNase R, the leading nonstop mRNA surveillance factor, is recruited to stalled ribosomes in a trans-translation dependent process. To elucidate the contributions of SmpB and tmRNA to RNase R recruitment, we evaluated Escherichia coli–Francisella tularensis chimeric variants of tmRNA and SmpB. This evaluation showed that while the hybrid tmRNA supported nascent polypeptide tagging and ribosome rescue, it suffered defects in facilitating RNase R recruitment to stalled ribosomes. To gain further insights, we used established tmRNA and SmpB variants that impact distinct stages of the trans-translation process. Analysis of select tmRNA variants revealed that the sequence composition and positioning of the ultimate and penultimate codons of the tmRNA ORF play a crucial role in recruiting RNase R to rescued ribosomes. Evaluation of defined SmpB C-terminal tail variants highlighted the importance of establishing the tmRNA reading frame, and provided valuable clues into the timing of RNase R recruitment to rescued ribosomes. Taken together, these studies demonstrate that productive RNase R-ribosomes engagement requires active trans-translation, and suggest that RNase R captures the emerging nonstop mRNA at an early stage after establishment of the tmRNA ORF as the surrogate mRNA template. PMID:25200086

  5. Distinct tmRNA sequence elements facilitate RNase R engagement on rescued ribosomes for selective nonstop mRNA decay.

    PubMed

    Venkataraman, Krithika; Zafar, Hina; Karzai, A Wali

    2014-01-01

    trans-Translation, orchestrated by SmpB and tmRNA, is the principal eubacterial pathway for resolving stalled translation complexes. RNase R, the leading nonstop mRNA surveillance factor, is recruited to stalled ribosomes in a trans-translation dependent process. To elucidate the contributions of SmpB and tmRNA to RNase R recruitment, we evaluated Escherichia coli-Francisella tularensis chimeric variants of tmRNA and SmpB. This evaluation showed that while the hybrid tmRNA supported nascent polypeptide tagging and ribosome rescue, it suffered defects in facilitating RNase R recruitment to stalled ribosomes. To gain further insights, we used established tmRNA and SmpB variants that impact distinct stages of the trans-translation process. Analysis of select tmRNA variants revealed that the sequence composition and positioning of the ultimate and penultimate codons of the tmRNA ORF play a crucial role in recruiting RNase R to rescued ribosomes. Evaluation of defined SmpB C-terminal tail variants highlighted the importance of establishing the tmRNA reading frame, and provided valuable clues into the timing of RNase R recruitment to rescued ribosomes. Taken together, these studies demonstrate that productive RNase R-ribosomes engagement requires active trans-translation, and suggest that RNase R captures the emerging nonstop mRNA at an early stage after establishment of the tmRNA ORF as the surrogate mRNA template.

  6. Transcription profile of boar spermatozoa as revealed by RNA-sequencing

    USDA-ARS?s Scientific Manuscript database

    High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...

  7. Identifying proteins that bind a known RNA sequence using the yeast three-hybrid system.

    PubMed

    Koh, Yvonne Y; Wickens, Marvin

    2014-01-01

    The yeast three-hybrid system can be used to identify a protein partner of a known RNA sequence by screening a cDNA library fused to a transcription activation domain, with a hybrid RNA as 'bait.' Most commonly, such screens are performed to identify proteins that interact with a given RNA in vivo. © 2014 Elsevier Inc. All rights reserved.

  8. Evaluating Quality of Aged Archival Formalin-Fixed Paraffin-Embedded Samples for RNA-Sequencing

    EPA Science Inventory

    Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...

  9. Rapid Amplification of cDNA Ends for RNA Transcript Sequencing in Staphylococcus.

    PubMed

    Miller, Eric

    2016-01-01

    Rapid amplification of cDNA ends (RACE) is a technique that was developed to swiftly and efficiently amplify full-length RNA molecules in which the terminal ends have not been characterized. Current usage of this procedure has been more focused on sequencing and characterizing RNA 5' and 3' untranslated regions. Herein is described an adapted RACE protocol to amplify bacterial RNA transcripts.

  10. Evaluating Quality of Aged Archival Formalin-Fixed Paraffin-Embedded Samples for RNA-Sequencing

    EPA Science Inventory

    Archival formalin-fixed paraffin-embedded (FFPE) samples offer a vast, untapped source of genomic data for biomarker discovery. However, the quality of FFPE samples is often highly variable, and conventional methods to assess RNA quality for RNA-sequencing (RNA-seq) are not infor...

  11. In silico detection of tRNA sequence features characteristic to aminoacyl-tRNA synthetase class membership

    PubMed Central

    Jakó, Éena; Ittzés, Péter; Szenes, Áron; Kun, Ádám; Szathmáry, Eörs; Pál, Gábor

    2007-01-01

    Aminoacyl tRNA synthetases (aaRS) are grouped into Class I and II based on primary and tertiary structure and enzyme properties suggesting two independent phylogenetic lineages. Analogously, tRNA molecules can also form two respective classes, based on the class membership of their corresponding aaRS. Although some aaRS–tRNA interactions are not extremely specific and require editing mechanisms to avoid misaminoacylation, most aaRS–tRNA interactions are rather stereospecific. Thus, class-specific aaRS features could be mirrored by class-specific tRNA features. However, previous investigations failed to detect conserved class-specific nucleotides. Here we introduce a discrete mathematical approach that evaluates not only class-specific ‘strictly present’, but also ‘strictly absent’ nucleotides. The disjoint subsets of these elements compose a unique partition, named extended consensus partition (ECP). By analyzing the ECP for both Class I and II tDNA sets from 50 (13 archaeal, 30 bacterial and 7 eukaryotic) species, we could demonstrate that class-specific tRNA sequence features do exist, although not in terms of strictly conserved nucleotides as it had previously been anticipated. This finding demonstrates that important information was hidden in tRNA sequences inaccessible for traditional statistical methods. The ECP analysis might contribute to the understanding of tRNA evolution and could enrich the sequence analysis tool repertoire. PMID:17704131

  12. Optimized approach for Ion Proton RNA sequencing reveals details of RNA splicing and editing features of the transcriptome.

    PubMed

    Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A

    2017-01-01

    RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.

  13. Sequences controlling histone H4 mRNA abundance.

    PubMed Central

    Capasso, O; Bleecker, G C; Heintz, N

    1987-01-01

    The post-transcriptional regulation of histone mRNA abundance is manifest both by accumulation of histone mRNA during the S phase, and by the rapid degradation of mature histone mRNA following the inhibition of DNA synthesis. We have constructed a comprehensive series of substitution mutants within a human H4 histone gene, introduced them into the mouse L cell genome, and analyzed their effects on the post-transcriptional control of the H4 mRNA. Our results demonstrate that most of the H4 mRNA is dispensable for proper regulation of histone mRNA abundance. However, recognition of the 3' terminus of the mature H4 mRNA is critically important for regulating its cytoplasmic half-life. Thus, this region of the mRNA functions both in the nucleus as a signal for proper processing of the mRNA terminus, and in the cytoplasm as an essential element in the control of mRNA stability. Images Fig. 2. Fig. 3. Fig. 4. Fig. 5. PMID:3608993

  14. Analysis of long noncoding RNA and mRNA using RNA sequencing during the differentiation of intramuscular preadipocytes in chicken

    PubMed Central

    Zhang, Tao; Zhang, Xiangqian; Han, Kunpeng; Zhang, Genxi; Wang, Jinyu; Xie, Kaizhou; Xue, Qian; Fan, Xiaomei

    2017-01-01

    Long noncoding RNAs (lncRNAs) regulate metabolic tissue development and function, including adipogenesis. However, little is known about the function and profile of lncRNAs in intramuscular preadipocyte differentiation in chicken. Here, we identified lncRNAs in chicken intramuscular preadipocytes at different differentiation stages using RNA sequencing. A total of 1,311,382,604 clean reads and 25,435 lncRNAs were obtained from 12 samples. In total, 7,433 differentially expressed genes (4,698 lncRNAs and 2,735 mRNAs) were identified by pairwise comparison. These 7,433 differentially expressed genes were grouped into 11 clusters based on their expression patterns by K-means clustering. Using Weighted Gene Coexpression Network Analysis, we identified four stage-specific modules positively related to I0, I2, I4, and I6 stages and two stage-specific modules negatively related to I0 and I2 stages, respectively. Many well-known and novel pathways associated with intramuscular preadipocyte differentiation were identified. We also identified hub genes in each stage-specific module and visualized them in Cytoscape. Our analysis revealed many highly-connected genes, including XLOC_058593, BMP3, MYOD1, and LAMP3. This study provides a valuable resource for chicken lncRNA study and improves our understanding of the biology of preadipocyte differentiation in chicken. PMID:28199418

  15. Analysis of long noncoding RNA and mRNA using RNA sequencing during the differentiation of intramuscular preadipocytes in chicken.

    PubMed

    Zhang, Tao; Zhang, Xiangqian; Han, Kunpeng; Zhang, Genxi; Wang, Jinyu; Xie, Kaizhou; Xue, Qian; Fan, Xiaomei

    2017-01-01

    Long noncoding RNAs (lncRNAs) regulate metabolic tissue development and function, including adipogenesis. However, little is known about the function and profile of lncRNAs in intramuscular preadipocyte differentiation in chicken. Here, we identified lncRNAs in chicken intramuscular preadipocytes at different differentiation stages using RNA sequencing. A total of 1,311,382,604 clean reads and 25,435 lncRNAs were obtained from 12 samples. In total, 7,433 differentially expressed genes (4,698 lncRNAs and 2,735 mRNAs) were identified by pairwise comparison. These 7,433 differentially expressed genes were grouped into 11 clusters based on their expression patterns by K-means clustering. Using Weighted Gene Coexpression Network Analysis, we identified four stage-specific modules positively related to I0, I2, I4, and I6 stages and two stage-specific modules negatively related to I0 and I2 stages, respectively. Many well-known and novel pathways associated with intramuscular preadipocyte differentiation were identified. We also identified hub genes in each stage-specific module and visualized them in Cytoscape. Our analysis revealed many highly-connected genes, including XLOC_058593, BMP3, MYOD1, and LAMP3. This study provides a valuable resource for chicken lncRNA study and improves our understanding of the biology of preadipocyte differentiation in chicken.

  16. RNA sequencing of the exercise transcriptome in equine athletes.

    PubMed

    Capomaccio, Stefano; Vitulo, Nicola; Verini-Supplizi, Andrea; Barcaccia, Gianni; Albiero, Alessandro; D'Angelo, Michela; Campagna, Davide; Valle, Giorgio; Felicetti, Michela; Silvestrelli, Maurizio; Cappelli, Katia

    2013-01-01

    The horse is an optimal model organism for studying the genomic response to exercise-induced stress, due to its natural aptitude for athletic performance and the relative homogeneity of its genetic and environmental backgrounds. Here, we applied RNA-sequencing analysis through the use of SOLiD technology in an experimental framework centered on exercise-induced stress during endurance races in equine athletes. We monitored the transcriptional landscape by comparing gene expression levels between animals at rest and after competition. Overall, we observed a shift from coding to non-coding regions, suggesting that the stress response involves the differential expression of not annotated regions. Notably, we observed significant post-race increases of reads that correspond to repeats, especially the intergenic and intronic L1 and L2 transposable elements. We also observed increased expression of the antisense strands compared to the sense strands in intronic and regulatory regions (1 kb up- and downstream) of the genes, suggesting that antisense transcription could be one of the main mechanisms for transposon regulation in the horse under stress conditions. We identified a large number of transcripts corresponding to intergenic and intronic regions putatively associated with new transcriptional elements. Gene expression and pathway analysis allowed us to identify several biological processes and molecular functions that may be involved with exercise-induced stress. Ontology clustering reflected mechanisms that are already known to be stress activated (e.g., chemokine-type cytokines, Toll-like receptors, and kinases), as well as "nucleic acid binding" and "signal transduction activity" functions. There was also a general and transient decrease in the global rates of protein synthesis, which would be expected after strenuous global stress. In sum, our network analysis points toward the involvement of specific gene clusters in equine exercise-induced stress, including

  17. Predicting RNA-binding residues from evolutionary information and sequence conservation

    PubMed Central

    2010-01-01

    Abstract Background RNA-binding proteins (RBPs) play crucial roles in post-transcriptional control of RNA. RBPs are designed to efficiently recognize specific RNA sequences after it is derived from the DNA sequence. To satisfy diverse functional requirements, RNA binding proteins are composed of multiple blocks of RNA-binding domains (RBDs) presented in various structural arrangements to provide versatile functions. The ability to computationally predict RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. Results The proposed prediction framework named “ProteRNA” combines a SVM-based classifier with conserved residue discovery by WildSpan to identify the residues that interact with RNA in a RNA-binding protein. Although these conserved residues can be either functionally conserved residues or structurally conserved residues, they provide clues on the important residues in a protein sequence. In the independent testing dataset, ProteRNA has been able to deliver overall accuracy of 89.78%, MCC of 0.2628, F-score of 0.3075, and F0.5-score of 0.3546. Conclusions This article presents the design of a sequence-based predictor aiming to identify the RNA-binding residues in a RNA-binding protein by combining machine learning and pattern mining approaches. RNA-binding proteins have diverse functions while interacting with different categories of RNAs because these proteins are composed of multiple copies of RNA-binding domains presented in various structural arrangements to expand the functional repertoire of RNA-binding proteins. Furthermore, predicting RNA-binding residues in a RNA-binding protein can help biologists reveal important site-directed mutagenesis in wet-lab experiments. PMID:21143803

  18. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases.

    PubMed

    Qin, Yidan; Yao, Jun; Wu, Douglas C; Nottingham, Ryan M; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. © 2015 Qin et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  19. Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery

    PubMed Central

    Krzyzanowski, Paul M.; Price, Feodor D.; Muro, Enrique M.; Rudnicki, Michael A.; Andrade-Navarro, Miguel A.

    2011-01-01

    Many computational methods have been used to predict novel non-coding RNAs (ncRNAs), but none, to our knowledge, have explicitly investigated the impact of integrating existing cDNA-based Expressed Sequence Tag (EST) data that flank structural RNA predictions. To determine whether flanking EST data can assist in microRNA (miRNA) prediction, we identified genomic sites encoding putative miRNAs by combining functional RNA predictions with flanking ESTs data in a model consistent with miRNAs undergoing cleavage during maturation. In both human and mouse genomes, we observed that the inclusion of flanking ESTs adjacent to and not overlapping predicted miRNAs significantly improved the performance of various methods of miRNA prediction, including direct high-throughput sequencing of small RNA libraries. We analyzed the expression of hundreds of miRNAs predicted to be expressed during myogenic differentiation using a customized microarray and identified several known and predicted myogenic miRNA hairpins. Our results indicate that integrating ESTs flanking structural RNA predictions improves the quality of cleaved miRNA predictions and suggest that this strategy can be used to predict other non-coding RNAs undergoing cleavage during maturation. PMID:21698286

  20. RNA2DNAlign: nucleotide resolution allele asymmetries through quantitative assessment of RNA and DNA paired sequencing data.

    PubMed

    Movassagh, Mercedeh; Alomran, Nawaf; Mudvari, Prakriti; Dede, Merve; Dede, Cem; Kowsari, Kamran; Restrepo, Paula; Cauley, Edmund; Bahl, Sonali; Li, Muzi; Waterhouse, Wesley; Tsaneva-Atanasova, Krasimira; Edwards, Nathan; Horvath, Anelia

    2016-12-15

    We introduce RNA2DNAlign, a computational framework for quantitative assessment of allele counts across paired RNA and DNA sequencing datasets. RNA2DNAlign is based on quantitation of the relative abundance of variant and reference read counts, followed by binomial tests for genotype and allelic status at SNV positions between compatible sequences. RNA2DNAlign detects positions with differential allele distribution, suggesting asymmetries due to regulatory/structural events. Based on the type of asymmetry, RNA2DNAlign outlines positions likely to be implicated in RNA editing, allele-specific expression or loss, somatic mutagenesis or loss-of-heterozygosity (the first three also in a tumor-specific setting). We applied RNA2DNAlign on 360 matching normal and tumor exomes and transcriptomes from 90 breast cancer patients from TCGA. Under high-confidence settings, RNA2DNAlign identified 2038 distinct SNV sites associated with one of the aforementioned asymetries, the majority of which have not been linked to functionality before. The performance assessment shows very high specificity and sensitivity, due to the corroboration of signals across multiple matching datasets. RNA2DNAlign is freely available from http://github.com/HorvathLab/NGS as a self-contained binary package for 64-bit Linux systems.

  1. 16S rRNA gene sequencing on a benchtop sequencer: accuracy for identification of clinically important bacteria.

    PubMed

    Watts, George S; Youens-Clark, Ken; Slepian, Marvin J; Wolk, Donna M; Oshiro, Marc M; Metzger, Gregory S; Dhingra, Dalia; Cranmer, Lee D; Hurwitz, Bonnie L

    2017-09-20

    Test the choice of 16S rRNA gene amplicon and data analysis method on the accuracy of identification of clinically important bacteria utilizing a benchtop sequencer. Nine 16S rRNA amplicons were tested on an Ion Torrent PGM to identify 41 strains of clinical importance. The V1-V2 region identified 40 of 41 isolates to the species level. Three data analysis methods were tested, finding that the Ribosomal Database Project's SequenceMatch outperformed BLAST and the Ion Reporter Metagenomics analysis pipeline. Lastly, 16S rRNA gene sequencing mixtures of four species through a six log range of dilution showed species were identifiable even when present as 0. 1% of the mixture. Sequencing the V1-V2 16S rRNA gene region, made possible by the increased read length Ion Torrent PGM sequencer's 400 base pair chemistry, may be a better choice over other commonly used regions for identifying clinically important bacteria. In addition, the SequenceMatch algorithm, freely available from the Ribosomal Database Project, is a good choice for matching filtered reads to organisms. Lastly, 16S rRNA gene sequencing's sensitivity to the presence of a bacterial species at 0.1% of a mixture, suggests it has sufficient sensitivity for samples in which important bacteria may be rare. We have validated 16S rRNA gene sequencing on a benchtop sequencer including simple mixtures of organisms; however, our results highlight deficits for clinical application in place of current identification methods. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  2. Sequence-specific cleavage of dsRNA by Mini-III RNase.

    PubMed

    Głów, Dawid; Pianka, Dariusz; Sulej, Agata A; Kozłowski, Łukasz P; Czarnecka, Justyna; Chojnowski, Grzegorz; Skowronek, Krzysztof J; Bujnicki, Janusz M

    2015-03-11

    Ribonucleases (RNases) play a critical role in RNA processing and degradation by hydrolyzing phosphodiester bonds (exo- or endonucleolytically). Many RNases that cut RNA internally exhibit substrate specificity, but their target sites are usually limited to one or a few specific nucleotides in single-stranded RNA and often in a context of a particular three-dimensional structure of the substrate. Thus far, no RNase counterparts of restriction enzymes have been identified which could cleave double-stranded RNA (dsRNA) in a sequence-specific manner. Here, we present evidence for a sequence-dependent cleavage of long dsRNA by RNase Mini-III from Bacillus subtilis (BsMiniIII). Analysis of the sites cleaved by this enzyme in limited digest of bacteriophage Φ6 dsRNA led to the identification of a consensus target sequence. We defined nucleotide residues within the preferred cleavage site that affected the efficiency of the cleavage and were essential for the discrimination of cleavable versus non-cleavable dsRNA sequences. We have also determined that the loop α5b-α6, a distinctive structural element in Mini-III RNases, is crucial for the specific cleavage, but not for dsRNA binding. Our results suggest that BsMiniIII may serve as a prototype of a sequence-specific dsRNase that could possibly be used for targeted cleavage of dsRNA.

  3. Discriminative Prediction of A-To-I RNA Editing Events from DNA Sequence

    PubMed Central

    Sun, Jiangming; Singh, Pratibha; Bagge, Annika; Valtat, Bérengère; Vikman, Petter; Spégel, Peter; Mulder, Hindrik

    2016-01-01

    RNA editing is a post-transcriptional alteration of RNA sequences that, via insertions, deletions or base substitutions, can affect protein structure as well as RNA and protein expression. Recently, it has been suggested that RNA editing may be more frequent than previously thought. A great impediment, however, to a deeper understanding of this process is the paramount sequencing effort that needs to be undertaken to identify RNA editing events. Here, we describe an in silico approach, based on machine learning, that ameliorates this problem. Using 41 nucleotide long DNA sequences, we show that novel A-to-I RNA editing events can be predicted from known A-to-I RNA editing events intra- and interspecies. The validity of the proposed method was verified in an independent experimental dataset. Using our approach, 203 202 putative A-to-I RNA editing events were predicted in the whole human genome. Out of these, 9% were previously reported. The remaining sites require further validation, e.g., by targeted deep sequencing. In conclusion, the approach described here is a useful tool to identify potential A-to-I RNA editing events without the requirement of extensive RNA sequencing. PMID:27764195

  4. Two accurate sequence, structure, and phylogenetic template-based RNA alignment systems.

    PubMed

    Shang, Lei; Gardner, David P; Xu, Weijia; Cannone, Jamie J; Miranker, Daniel P; Ozer, Stuart; Gutell, Robin R

    2013-01-01

    The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data. Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database--rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment. The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed. Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately

  5. Two accurate sequence, structure, and phylogenetic template-based RNA alignment systems

    PubMed Central

    2013-01-01

    Background The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data. Methods Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment. Results The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed. Conclusions Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the

  6. Globin mRNA reduction for whole-blood transcriptome sequencing

    PubMed Central

    Krjutškov, Kaarel; Koel, Mariann; Roost, Anne Mari; Katayama, Shintaro; Einarsdottir, Elisabet; Jouhilahti, Eeva-Mari; Söderhäll, Cilla; Jaakma, Ülle; Plaas, Mario; Vesterlund, Liselotte; Lohi, Hannes; Salumets, Andres; Kere, Juha

    2016-01-01

    The transcriptome analysis of whole-blood RNA by sequencing holds promise for the identification and tracking of biomarkers; however, the high globin mRNA (gmRNA) content of erythrocytes hampers whole-blood and buffy coat analyses. We introduce a novel gmRNA locking assay (GlobinLock, GL) as a robust and simple gmRNA reduction tool to preserve RNA quality, save time and cost. GL consists of a pair of gmRNA-specific oligonucleotides in RNA initial denaturation buffer that is effective immediately after RNA denaturation and adds only ten minutes of incubation to the whole cDNA synthesis procedure when compared to non-blood RNA analysis. We show that GL is fully effective not only for human samples but also for mouse and rat, and so far incompletely studied cow, dog and zebrafish. PMID:27515369

  7. Selecting effective siRNA target sequences by using Bayes' theorem.

    PubMed

    Takasaki, Shigeru

    2009-10-01

    Short interfering RNA (siRNA) has been widely used for studying gene functions in mammalian cells but varies markedly in its gene silencing efficacy. Although many design rules/guidelines for effective siRNAs based on various criteria have been reported recently, there are few consistencies among them. This makes it difficult to select effective siRNA sequences in mammalian genes. Another shortcoming of most previously reported methods is that they cannot estimate the probability that a candidate sequence will silence the target gene. The analytical prediction method proposed in the present study uses Bayes' theorem to select effective siRNA target sequences from many possible candidate sequences. It is quite different from the previous score-based siRNA design techniques and can predict the probability that a candidate siRNA sequence will be effective. The results of evaluating it by applying it to recently reported effective and ineffective siRNA sequences for various genes indicate that it would be useful for many other genes. It should therefore be useful for selecting siRNA sequences effective for mammalian genes.

  8. RNA Secondary Structures Having a Compatible Sequence of Certain Nucleotide Ratios.

    PubMed

    Barrett, Christopher L; Li, Thomas J X; Reidys, Christian M

    2016-11-01

    Given a random RNA secondary structure, S, we study RNA sequences having fixed ratios of nucleotides that are compatible with S. We perform this analysis for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. Our main result reads as follows: in the simplex of nucleotide ratios, there exists a convex region, in which, in the limit of long sequences, a random structure asymptotically almost surely (a.a.s.) has compatible sequence with these ratios and outside of which a.a.s. a random structure has no such compatible sequence. We localize this region for RNA secondary structures subject to various base-pairing rules and minimum arc- and stack-length restrictions. In particular, for GC-sequences (GC denoting the nucleotides guanine and cytosine, respectively) having a ratio of G nucleotides smaller than 1/3, a random RNA secondary structure without any minimum arc- and stack-length restrictions has a.a.s. no such compatible sequence. For sequences having a ratio of G nucleotides larger than 1/3, a random RNA secondary structure has a.a.s. such compatible sequences. We discuss our results in the context of various families of RNA structures.

  9. A structural and primary sequence comparison of the viral RNA-dependent RNA polymerases

    PubMed Central

    Bruenn, Jeremy A.

    2003-01-01

    A systematic bioinformatic approach to identifying the evolutionarily conserved regions of proteins has verified the universality of a newly described conserved motif in RNA-dependent RNA polymerases (motif F). In combination with structural comparisons, this approach has defined two regions that may be involved in unwinding double-stranded RNA (dsRNA) for transcription. One of these is the N-terminal portion of motif F and the second is a large insertion in motif F present in the RNA-dependent RNA polymerases of some dsRNA viruses. PMID:12654997

  10. Use of 16S Ribosomal RNA Sequences to Infer Relationships among Archaebacteria.

    DTIC Science & Technology

    1987-04-16

    FIELD GROUP SUB-GROUP Archaebacteria; Eubacteria ; Eukaryotes; 16S Ribosomal RNA; 08 I Phylogeny; rRNA; RNA Sequencing; Molecular Clock; Urkingdoms; r...16S rRNA data were used to infer the relat onships among the archaebacteria, and of the archaebacteria to the eubacteria and eukaryotes. ur programs for...been published (1, 2, 16, 18). The analyses render untenable the suggestions of Lake and colleagues (Lake et al., 1985) that the eubacteria derive from

  11. Autocatalytic cyclization of an excised intervening sequence RNA is a cleavage-ligation reaction.

    PubMed

    Zaug, A J; Grabowski, P J; Cech, T R

    The intervening sequence (IVS) of the Tetrahymena ribosomal RNA precursor is excised as a linear RNA molecule which subsequently cyclizes itself in a protein-independent reaction. Cyclization involves cleavage of the linear IVS RNA 15 nucleotides from its 5' end and formation of a phosphodiester bond between the new 5' phosphate and the original 3'-hydroxyl terminus of the IVS. This recombination mechanism is analogous to that by which splicing of the precursor RNA is achieved. The circular molecules appear to have no direct function in RNA splicing, and we propose the cyclization serves to prevent unwanted RNA from driving the splicing reactions backwards.

  12. Melting temperature highlights functionally important RNA structure and sequence elements in yeast mRNA coding regions.

    PubMed

    Qi, Fei; Frishman, Dmitrij

    2017-03-07

    Secondary structure elements in the coding regions of mRNAs play an important role in gene expression and regulation, but distinguishing functional from non-functional structures remains challenging. Here we investigate the dependence of sequence-structure relationships in the coding regions on temperature based on the recent PARTE data by Wan et al. Our main finding is that the regions with high and low thermostability (high Tm and low Tm regions) are under evolutionary pressure to preserve RNA secondary structure and primary sequence, respectively. Sequences of low Tm regions display a higher degree of evolutionary conservation compared to high Tm regions. Low Tm regions are under strong synonymous constraint, while high Tm regions are not. These findings imply that high Tm regions contain thermo-stable functionally important RNA structures, which impose relaxed evolutionary constraint on sequence as long as the base-pairing patterns remain intact. By contrast, low thermostability regions contain single-stranded functionally important conserved RNA sequence elements accessible for binding by other molecules. We also find that theoretically predicted structures of paralogous mRNA pairs become more similar with growing temperature, while experimentally measured structures tend to diverge, which implies that the melting pathways of RNA structures cannot be fully captured by current computational approaches.

  13. High-throughput sequencing of human plasma RNA by using thermostable group II intron reverse transcriptases

    PubMed Central

    Qin, Yidan; Yao, Jun; Wu, Douglas C.; Nottingham, Ryan M.; Mohr, Sabine; Hunicke-Smith, Scott; Lambowitz, Alan M.

    2016-01-01

    Next-generation RNA-sequencing (RNA-seq) has revolutionized transcriptome profiling, gene expression analysis, and RNA-based diagnostics. Here, we developed a new RNA-seq method that exploits thermostable group II intron reverse transcriptases (TGIRTs) and used it to profile human plasma RNAs. TGIRTs have higher thermostability, processivity, and fidelity than conventional reverse transcriptases, plus a novel template-switching activity that can efficiently attach RNA-seq adapters to target RNA sequences without RNA ligation. The new TGIRT-seq method enabled construction of RNA-seq libraries from <1 ng of plasma RNA in <5 h. TGIRT-seq of RNA in 1-mL plasma samples from a healthy individual revealed RNA fragments mapping to a diverse population of protein-coding gene and long ncRNAs, which are enriched in intron and antisense sequences, as well as nearly all known classes of small ncRNAs, some of which have never before been seen in plasma. Surprisingly, many of the small ncRNA species were present as full-length transcripts, suggesting that they are protected from plasma RNases in ribonucleoprotein (RNP) complexes and/or exosomes. This TGIRT-seq method is readily adaptable for profiling of whole-cell, exosomal, and miRNAs, and for related procedures, such as HITS-CLIP and ribosome profiling. PMID:26554030

  14. Identification of Dirofilaria immitis miRNA using illumina deep sequencing

    PubMed Central

    2013-01-01

    The heartworm Dirofilaria immitis is the causative agent of cardiopulmonary dirofilariosis in dogs and cats, which also infects a wide range of wild mammals and humans. The complex life cycle of D. immitis with several developmental stages in its invertebrate mosquito vectors and its vertebrate hosts indicates the importance of miRNA in growth and development, and their ability to regulate infection of mammalian hosts. This study identified the miRNA profiles of D. immitis of zoonotic significance by deep sequencing. A total of 1063 conserved miRNA candidates, including 68 anti-sense miRNA (miRNA*) sequences, were predicted by computational methods and could be grouped into 808 miRNA families. A significant bias towards family members, family abundance and sequence nucleotides was observed. Thirteen novel miRNA candidates were predicted by alignment with the Brugia malayi genome. Eleven out of 13 predicted miRNA candidates were verified by using a PCR-based method. Target genes of the novel miRNA candidates were predicted by using the heartworm transcriptome dataset. To our knowledge, this is the first report of miRNA profiles in D. immitis, which will contribute to a better understanding of the complex biology of this zoonotic filarial nematode and the molecular regulation roles of miRNA involved. Our findings may also become a useful resource for small RNA studies in other filarial parasitic nematodes. PMID:23331513

  15. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    PubMed

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  16. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    PubMed Central

    Jenior, Matthew L.; Koumpouras, Charles C.; Westcott, Sarah L.; Highlander, Sarah K.

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  17. Deep Sequencing of RNA from Ancient Maize Kernels

    PubMed Central

    Rasmussen, Morten; Cappellini, Enrico; Romero-Navarro, J. Alberto; Wales, Nathan; Alquezar-Planas, David E.; Penfield, Steven; Brown, Terence A.; Vielle-Calzada, Jean-Philippe; Montiel, Rafael; Jørgensen, Tina; Odegaard, Nancy; Jacobs, Michael; Arriaza, Bernardo; Higham, Thomas F. G.; Ramsey, Christopher Bronk; Willerslev, Eske; Gilbert, M. Thomas P.

    2013-01-01

    The characterization of biomolecules from ancient samples can shed otherwise unobtainable insights into the past. Despite the fundamental role of transcriptomal change in evolution, the potential of ancient RNA remains unexploited – perhaps due to dogma associated with the fragility of RNA. We hypothesize that seeds offer a plausible refuge for long-term RNA survival, due to the fundamental role of RNA during seed germination. Using RNA-Seq on cDNA synthesized from nucleic acid extracts, we validate this hypothesis through demonstration of partial transcriptomal recovery from two sources of ancient maize kernels. The results suggest that ancient seed transcriptomics may offer a powerful new tool with which to study plant domestication. PMID:23326310

  18. New perspectives on the diversification of the RNA interference system: insights from comparative genomics and small RNA sequencing.

    PubMed

    Burroughs, Alexander Maxwell; Ando, Yoshinari; Aravind, L

    2014-01-01

    Our understanding of the pervasive involvement of small RNAs in regulating diverse biological processes has been greatly augmented by recent application of deep-sequencing technologies to small RNA across diverse eukaryotes. We review the currently known small RNA classes and place them in context of the reconstructed evolutionary history of the RNA interference (RNAi) protein machinery. This synthesis indicates that the earliest versions of eukaryotic RNAi systems likely utilized small RNA processed from three types of precursors: (1) sense-antisense transcriptional products, (2) genome-encoded, imperfectly complementary hairpin sequences, and (3) larger noncoding RNA precursor sequences. Structural dissection of PIWI proteins along with recent discovery of novel families (including Med13 of the Mediator complex) suggest that emergence of a distinct architecture with the N-terminal domains (also occurring separately fused to endoDNases in prokaryotes) formed via duplication of an ancestral unit was key to their recruitment as primary RNAi effectors and use of small RNAs of certain preferred lengths. Prokaryotic PIWI proteins are typically components of several RNA-directed DNA restriction or CRISPR/Cas systems. However, eukaryotic versions appear to have emerged from a subset that evolved RNA-directed RNAi. They were recruited alongside RNaseIII domains and RNA-dependent RNA polymerase (RdRP) domains, also from prokaryotic systems, to form the core eukaryotic RNAi system. Like certain regulatory systems, RNAi diversified into two distinct but linked arms concomitant with eukaryotic nucleocytoplasmic compartmentalization. Subsequent elaboration of RNAi proceeded via diversification of the core protein machinery through lineage-specific expansions and recruitment of new components from prokaryotes (nucleases and small RNA-modifying enzymes), allowing for diversification of associating small RNAs. © 2013 John Wiley & Sons, Ltd.

  19. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential

  20. The impact of RNA secondary structure on read start locations on the Illumina sequencing platform.

    PubMed

    Price, Adam; Garhyan, Jaishree; Gibas, Cynthia

    2017-01-01

    High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias.

  1. The impact of RNA secondary structure on read start locations on the Illumina sequencing platform

    PubMed Central

    Price, Adam; Garhyan, Jaishree

    2017-01-01

    High-throughput sequencing is subject to sequence dependent bias, which must be accounted for if researchers are to make precise measurements and draw accurate conclusions from their data. A widely studied source of bias in sequencing is the GC content bias, in which levels of GC content in a genomic region effect the number of reads produced during sequencing. Although some research has been performed on methods to correct for GC bias, there has been little effort to understand the underlying mechanism. The availability of sequencing protocols that target the specific location of structure in nucleic acid molecules enables us to investigate the underlying molecular origin of observed GC bias in sequencing. By applying a parallel analysis of RNA structure (PARS) protocol to bacterial genomes of varying GC content, we are able to observe the relationship between local RNA secondary structure and sequencing outcome, and to establish RNA secondary structure as the significant contributing factor to observed GC bias. PMID:28245230

  2. Sequence specific detection of bacterial 23S ribosomal RNA by TLR13

    PubMed Central

    Li, Xiao-Dong; Chen, Zhijian J

    2012-01-01

    Toll-like receptors (TLRs) detect microbial infections and trigger innate immune responses. Among vertebrate TLRs, the role of TLR13 and its ligand are unknown. Here we show that TLR13 detects the 23S ribosomal RNA of both gram-positive and gram-negative bacteria. A sequence containing 13 nucleotides near the active site of 23S rRNA ribozyme, which catalyzes peptide bond synthesis, was both necessary and sufficient to trigger TLR13-dependent interleukin-1β production. Single point mutations within this sequence destroyed the ability of the 23S rRNA to stimulate the TLR13 pathway. Knockout of TLR13 in mice abolished the induction of interleukin-1β and other cytokines by the 23S rRNA sequence. Thus, TLR13 detects bacterial RNA with exquisite sequence specificity. DOI: http://dx.doi.org/10.7554/eLife.00102.001 PMID:23110254

  3. Sequence-specific RNA Photocleavage by Single-stranded DNA in Presence of Riboflavin

    NASA Astrophysics Data System (ADS)

    Zhao, Yongyun; Chen, Gangyi; Yuan, Yi; Li, Na; Dong, Juan; Huang, Xin; Cui, Xin; Tang, Zhuo

    2015-10-01

    Constant efforts have been made to develop new method to realize sequence-specific RNA degradation, which could cause inhibition of the expression of targeted gene. Herein, by using an unmodified short DNA oligonucleotide for sequence recognition and endogenic small molecue, vitamin B2 (riboflavin) as photosensitizer, we report a simple strategy to realize the sequence-specific photocleavage of targeted RNA. The DNA strand is complimentary to the target sequence to form DNA/RNA duplex containing a G•U wobble in the middle. The cleavage reaction goes through oxidative elimination mechanism at the nucleoside downstream of U of the G•U wobble in duplex to obtain unnatural RNA terminal, and the whole process is under tight control by using light as switch, which means the cleavage could be carried out according to specific spatial and temporal requirements. The biocompatibility of this method makes the DNA strand in combination with riboflavin a promising molecular tool for RNA manipulation.

  4. Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

    PubMed

    Reid-Bayliss, Kate S; Loeb, Lawrence A

    2017-08-29

    Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.

  5. Genome-Wide Probing of RNA Structures In Vitro Using Nucleases and Deep Sequencing.

    PubMed

    Wan, Yue; Qu, Kun; Ouyang, Zhengqing; Chang, Howard Y

    2016-01-01

    RNA structure probing is an important technique that studies the secondary and tertiary conformations of an RNA. While it was traditionally performed on one RNA at a time, recent advances in deep sequencing has enabled the secondary structure mapping of thousands of RNAs simultaneously. Here, we describe the method Parallel Analysis for RNA Structures (PARS), which couples double and single strand specific nuclease probing to high throughput sequencing. Upon cloning of the cleavage sites into a cDNA library, deep sequencing and mapping of reads to the transcriptome, the position of paired and unpaired bases along cellular RNAs can be identified. PARS can be performed under diverse solution conditions and on different organismal RNAs to provide genome-wide RNA structural information. This information can also be further used to constrain computational predictions to provide better RNA structure models under different conditions.

  6. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues.

    PubMed

    Lee, Je Hyuk; Daugharthy, Evan R; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C; Terry, Richard; Turczyk, Brian M; Yang, Joyce L; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M

    2015-03-01

    RNA-sequencing (RNA-seq) measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. In contrast, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq, our method enriches for context-specific transcripts over housekeeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d.

  7. Common 5S rRNA variants are likely to be accepted in many sequence contexts

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.

    2003-01-01

    Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The

  8. Common 5S rRNA variants are likely to be accepted in many sequence contexts

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; D'Souza, Lisa M.; Lee, Youn-Hyung; Fox, George E.

    2003-01-01

    Over evolutionary time RNA sequences which are successfully fixed in a population are selected from among those that satisfy the structural and chemical requirements imposed by the function of the RNA. These sequences together comprise the structure space of the RNA. In principle, a comprehensive understanding of RNA structure and function would make it possible to enumerate which specific RNA sequences belong to a particular structure space and which do not. We are using bacterial 5S rRNA as a model system to attempt to identify principles that can be used to predict which sequences do or do not belong to the 5S rRNA structure space. One promising idea is the very intuitive notion that frequently seen sequence changes in an aligned data set of naturally occurring 5S rRNAs would be widely accepted in many other 5S rRNA sequence contexts. To test this hypothesis, we first developed well-defined operational definitions for a Vibrio region of the 5S rRNA structure space and what is meant by a highly variable position. Fourteen sequence variants (10 point changes and 4 base-pair changes) were identified in this way, which, by the hypothesis, would be expected to incorporate successfully in any of the known sequences in the Vibrio region. All 14 of these changes were constructed and separately introduced into the Vibrio proteolyticus 5S rRNA sequence where they are not normally found. Each variant was evaluated for its ability to function as a valid 5S rRNA in an E. coli cellular context. It was found that 93% (13/14) of the variants tested are likely valid 5S rRNAs in this context. In addition, seven variants were constructed that, although present in the Vibrio region, did not meet the stringent criteria for a highly variable position. In this case, 86% (6/7) are likely valid. As a control we also examined seven variants that are seldom or never seen in the Vibrio region of 5S rRNA sequence space. In this case only two of seven were found to be potentially valid. The

  9. Nucleotide sequence from the coding region of rabbit β-globin messenger RNA

    PubMed Central

    Proudfoot, N.J.

    1976-01-01

    A sequence of 89 nucleotides from rabbit β-globin mRNA has been determined and is shown to code for residues 107 to 137 of the β-globin protein. In addition, a sequence heterogeneity has been identified within this 89 nucleotide long sequence which corresponds to a known polymorphic variant of rabbit β-globin. Images PMID:61580

  10. Multilign: an algorithm to predict secondary structures conserved in multiple RNA sequences

    PubMed Central

    Xu, Zhenjiang; Mathews, David H.

    2011-01-01

    Motivation: With recent advances in sequencing, structural and functional studies of RNA lag behind the discovery of sequences. Computational analysis of RNA is increasingly important to reveal structure–function relationships with low cost and speed. The purpose of this study is to use multiple homologous sequences to infer a conserved RNA structure. Results: A new algorithm, called Multilign, is presented to find the lowest free energy RNA secondary structure common to multiple sequences. Multilign is based on Dynalign, which is a program that simultaneously aligns and folds two sequences to find the lowest free energy conserved structure. For Multilign, Dynalign is used to progressively construct a conserved structure from multiple pairwise calculations, with one sequence used in all pairwise calculations. A base pair is predicted only if it is contained in the set of low free energy structures predicted by all Dynalign calculations. In this way, Multilign improves prediction accuracy by keeping the genuine base pairs and excluding competing false base pairs. Multilign has computational complexity that scales linearly in the number of sequences. Multilign was tested on extensive datasets of sequences with known structure and its prediction accuracy is among the best of available algorithms. Multilign can run on long sequences (> 1500 nt) and an arbitrarily large number of sequences. Availability: The algorithm is implemented in ANSI C++ and can be downloaded as part of the RNAstructure package at: http://rna.urmc.rochester.edu Contact: david_mathews@urmc.rochester.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21193521

  11. DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data.

    PubMed

    Tsuji, Junko; Weng, Zhiping

    2016-01-01

    With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were "ready-to-map" small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.

  12. DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

    PubMed Central

    Tsuji, Junko; Weng, Zhiping

    2016-01-01

    With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were “ready-to-map” small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies. PMID:27736901

  13. Differences in microRNA detection levels are technology and sequence dependent.

    PubMed

    Leshkowitz, Dena; Horn-Saban, Shirley; Parmet, Yisrael; Feldmesser, Ester

    2013-04-01

    Identification and quantification of small RNAs are challenging because of their short length, high sequence similarities within microRNA (miRNA) families, and the existence of miRNA isoforms and O-methyl 3' modifications. In this study, the detection performance of three high-throughput commercial platforms, Agilent and Affymetrix microarrays and Illumina next-generation sequencing, was systematically and comprehensively compared. The ability to detect miRNAs was shown to depend strongly on the platform and on miRNA modifications and sequence. Using synthetic transcripts, including mature, precursor, and O-methyl-modified miRNAs spiked into human RNA, a large intensity variation in all spiked-in miRNAs and a reduced capacity in detecting O-methyl-modified miRNAs were observed between the tested platforms. In addition, endogenous human miRNA expression levels were assessed across the platforms. Detected miRNA expression levels were not consistent between platforms. Although biases in miRNA detection were previously described, here the end-point result, i.e., detection intensity, of these biases was investigated on multiple platforms in a controlled fashion. A detailed exploration of a large number of attributes, including base composition, sequence structure, and isoform miRNA attributes, suggests their impact on miRNA expression detection level. This study provides a basis for understanding the attributes that should be considered to adjust platform-dependent detection biases.

  14. RNA metabolism in isolated nuclei: processing and transport of immunoglobulin light chain sequences.

    PubMed Central

    Otegui, C; Patterson, R J

    1981-01-01

    Transport of prelabeled RNA from isolated myeloma nuclei is studied using conditions that permit RNA synthesis. Cytosol and spermidine are not required to maintain nuclear stability and inhibited RNA release. Omission of ATP or GTP decreased release 25 to 40%. The stimulatory effect of ATP or GTP is not due to hydrolysis of the triphosphates by the nuclear envelope NTPase, since addition of quercetin (an inhibitor of this NTPase) has no effect on the quantity of RNA released. The size distribution and percentage of poly A-containing species released from nuclei incubated with or without ATP or the other rNTPs are identical. Hybridization analysis of nuclear RNA before the transport assay revealed mature and precursor k light chain mRNA sequences. Following the transport assay, a significant fraction of k mRNA precursors is chased into mature k mRNA which is found both in nuclear-retained and released RNA. PMID:6795596

  15. Nucleotide sequence and genetic organization of Hungarian grapevine chrome mosaic nepovirus RNA2.

    PubMed Central

    Brault, V; Hibrand, L; Candresse, T; Le Gall, O; Dunez, J

    1989-01-01

    The complete nucleotide sequence of hungarian grapevine chrome mosaic nepovirus (GCMV) RNA2 has been determined. The RNA sequence is 4441 nucleotides in length, excluding the poly(A) tail. A polyprotein of 1324 amino acids with a calculated molecular weight of 146 kDa is encoded in a single long open reading frame extending from nucleotides 218 to 4190. This polyprotein is homologous with the protein encoded by the S strain of tomato black ring virus (TBRV) RNA2, the only other nepovirus sequenced so far. Direct sequencing of the viral coat protein and in vitro translation of transcripts derived from cDNA sequences demonstrate that, as for comoviruses, the coat protein is located at the carboxy terminus of the polyprotein. A model for the expression of GCMV RNA2 is presented. Images PMID:2798129

  16. a Simple Symmetric Algorithm Using a Likeness with Introns Behavior in RNA Sequences

    NASA Astrophysics Data System (ADS)

    Regoli, Massimo

    2009-02-01

    The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. The RNA sequences has some sections called Introns. Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behaviour in the access to the secret key to code the messages. In the RNA-Crypto System algoritnm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.

  17. Nucleotide sequences of three tRNA(Ser) from Drosophila melanogaster reading the six serine codons.

    PubMed

    Cribbs, D L; Gillam, I C; Tener, G M

    1987-10-05

    The nucleotide sequences of three serine tRNAs from Drosophila melanogaster, together capable of decoding the six serine codons, were determined. tRNA(Ser)2b has the anticodon GCU, tRNA(Ser)4 has CGA and tRNA(Ser)7 has IGA. tRNA(Ser)2b differs from the last two by about 25%. However, tRNA(Ser)4 and tRNA(Ser)7 are 96% homologous, differing only at the first position of the anticodon and two other sites. This unusual sequence relationship suggests, together with similar pairs in the yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae, that eukaryotic tRNA(Ser)UCN may be undergoing concerted evolution.

  18. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses.

    PubMed

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-03-07

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as "DECS-C," is a powerful method for detecting novel plant viruses.

  19. Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

    PubMed Central

    Yanagisawa, Hironobu; Tomita, Reiko; Katsu, Koji; Uehara, Takuya; Atsumi, Go; Tateda, Chika; Kobayashi, Kappei; Sekine, Ken-Taro

    2016-01-01

    The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses. PMID:27072419

  20. Mapping the miRNA interactome by cross-linking ligation and sequencing of hybrids (CLASH).

    PubMed

    Helwak, Aleksandra; Tollervey, David

    2014-03-01

    RNA-RNA interactions have critical roles in many cellular processes, but studying them is difficult and laborious. Here we describe an experimental procedure, termed cross-linking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV-cross-linked in cell cultures to stabilize RNA interactions, and it is purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA duplexes are ligated together. After linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute (AGO) proteins, but it should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires ∼5 d to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses.

  1. Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)

    PubMed Central

    Helwak, Aleksandra; Tollervey, David

    2014-01-01

    RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361

  2. Alterations of microRNA and microRNA-regulated messenger RNA expression in germinal center B-cell lymphomas determined by integrative sequencing analysis.

    PubMed

    Hezaveh, Kebria; Kloetgen, Andreas; Bernhart, Stephan H; Mahapatra, Kunal Das; Lenze, Dido; Richter, Julia; Haake, Andrea; Bergmann, Anke K; Brors, Benedikt; Burkhardt, Birgit; Claviez, Alexander; Drexler, Hans G; Eils, Roland; Haas, Siegfried; Hoffmann, Steve; Karsch, Dennis; Klapper, Wolfram; Kleinheinz, Kortine; Korbel, Jan; Kretzmer, Helene; Kreuz, Markus; Küppers, Ralf; Lawerenz, Chris; Leich, Ellen; Loeffler, Markus; Mantovani-Loeffler, Luisa; López, Cristina; McHardy, Alice C; Möller, Peter; Rohde, Marius; Rosenstiel, Philip; Rosenwald, Andreas; Schilhabel, Markus; Schlesner, Matthias; Scholz, Ingrid; Stadler, Peter F; Stilgenbauer, Stephan; Sungalee, Stéphanie; Szczepanowski, Monika; Trümper, Lorenz; Weniger, Marc A; Siebert, Reiner; Borkhardt, Arndt; Hummel, Michael; Hoell, Jessica I

    2016-11-01

    MicroRNA are well-established players in post-transcriptional gene regulation. However, information on the effects of microRNA deregulation mainly relies on bioinformatic prediction of potential targets, whereas proof of the direct physical microRNA/target messenger RNA interaction is mostly lacking. Within the International Cancer Genome Consortium Project "Determining Molecular Mechanisms in Malignant Lymphoma by Sequencing", we performed miRnome sequencing from 16 Burkitt lymphomas, 19 diffuse large B-cell lymphomas, and 21 follicular lymphomas. Twenty-two miRNA separated Burkitt lymphomas from diffuse large B-cell lymphomas/follicular lymphomas, of which 13 have shown regulation by MYC. Moreover, we found expression of three hitherto unreported microRNA. Additionally, we detected recurrent mutations of hsa-miR-142 in diffuse large B-cell lymphomas and follicular lymphomas, and editing of the hsa-miR-376 cluster, providing evidence for microRNA editing in lymphomagenesis. To interrogate the direct physical interactions of microRNA with messenger RNA, we performed Argonaute-2 photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation experiments. MicroRNA directly targeted 208 messsenger RNA in the Burkitt lymphomas and 328 messenger RNA in the non-Burkitt lymphoma models. This integrative analysis discovered several regulatory pathways of relevance in lymphomagenesis including Ras, PI3K-Akt and MAPK signaling pathways, also recurrently deregulated in lymphomas by mutations. Our dataset reveals that messenger RNA deregulation through microRNA is a highly relevant mechanism in lymphomagenesis.

  3. The nucleotide sequence of 5S ribosomal RNA from slime mold Physarum polycephalum.

    PubMed

    Komiya, H; Takemura, S

    1981-12-01

    The nucleotide sequence of 5S ribosomal RNA from plasmodia of the slime mold Physarum polycephalum was determined as pppGGAUGCGGC CAUACUAAGG 20 AGAAAGCACC 30 UCAUCCCGUC 40 CGAUCUGAGA 50 AGUUAAGCUC 60 CUUCAGGCGU 70 GGUUAGUACU 80 GGGGUGGGGG 90 ACCACCUGGG 100 AAUCCCACGU 110 GCUGCAUUCU 120 Uoh by chemical and enzymatic gel sequencing technics using 3' and 5' end-labeled RNA. This RNA is very different from 5S rRNA of the cellular slime mold Dictyostelium discoideum (36 nucleotides are different), and shows greater similarity to 5S rRNAs from Protozoa and Metazoa than to those from fungi.

  4. Quantifying RNA allelic ratios by microfluidics-based multiplex PCR and deep sequencing

    PubMed Central

    Zhang, Rui; Li, Xin; Ramaswami, Gokul; Smith, Kevin S; Turecki, Gustavo; Montgomery, Stephen B; Li, Jin Billy

    2013-01-01

    We developed a targeted RNA sequencing method that couples microfluidics-based multiplex PCR and deep sequencing (mmPCR-seq) to uniformly and simultaneously amplify up to 960 loci in 48 samples independently of their gene expression levels, and accurately and cost-effectively measure allelic ratios even for low-quantity or low-quality RNA samples. We applied mmPCR-seq to RNA editing and allele-specific expression studies. mmPCR-seq complements RNA-seq and provides a highly desirable solution for future applications. PMID:24270603

  5. RNA editing generates cellular subsets with diverse sequence within populations

    PubMed Central

    Harjanto, Dewi; Papamarkou, Theodore; Oates, Chris J.; Rayon-Estrada, Violeta; Papavasiliou, F. Nina; Papavasiliou, Anastasia

    2016-01-01

    RNA editing is a mutational mechanism that specifically alters the nucleotide content in transcribed RNA. However, editing rates vary widely, and could result from equivalent editing amongst individual cells, or represent an average of variable editing within a population. Here we present a hierarchical Bayesian model that quantifies the variance of editing rates at specific sites using RNA-seq data from both single cells, and a cognate bulk sample to distinguish between these two possibilities. The model predicts high variance for specific edited sites in murine macrophages and dendritic cells, findings that we validated experimentally by using targeted amplification of specific editable transcripts from single cells. The model also predicts changes in variance in editing rates for specific sites in dendritic cells during the course of LPS stimulation. Our data demonstrate substantial variance in editing signatures amongst single cells, supporting the notion that RNA editing generates diversity within cellular populations. PMID:27418407

  6. Knowledge discovery in RNA sequence families of HIV using scalable computers

    SciTech Connect

    Hofacker, I.L.; Huynen, M.A.; Stadler, P.F.; Stolorz, P.E.

    1996-12-31

    The prediction of RNA secondary structure on the basis of sequence information is an important tool in biosequence analysis. However, it has typically been restricted to molecules containing no more than 4000 nucleotides due to the computational complexity of the underlying dynamic programming algorithm used. We describe here an approach to RNA sequence analysis based upon scalable computers, which enables molecules containing up to 20,000 nucleotides to be analyzed. We apply the approach to investigation of the entire HIV genome, illustrating the power of these methods to perform knowledge discovery by identification of important secondary structure motifs within RNA sequence families.

  7. Small RNA Library Cloning Procedure for Deep Sequencing of Specific Endogenous siRNA Classes in Caenorhabditis elegans

    PubMed Central

    Ow, Maria C.; Lau, Nelson C.; Hall, Sarah E.

    2017-01-01

    In recent years, distinct classes of small RNAs ranging in size from ~21 to 26 nucleotides have been discovered and shown to play important roles in a wide array of cellular functions. Because of the abundance of these small RNAs, library preparation from an RNA sample followed by deep sequencing provides the identity and quantity of a particular class of small RNAs. In this chapter we describe a detailed protocol for preparing small RNA libraries for deep sequencing on the Illumina platform from the nematode C. elegans. PMID:24920360

  8. Using small RNA deep sequencing data to detect siRNA duplexes induced by plant viruses

    USDA-ARS?s Scientific Manuscript database

    Small interfering RNA (siRNA) duplexes are produced in plants during virus infection, which are short (usually 21 to 24-base pair) double-stranded RNAs (dsRNAs) with several overhanging nucleotides on the 5' end and 3' end. The investigation of the siRNA duplexes is useful to better understand the R...

  9. Bacterial toxin HigB associates with ribosomes and mediates translation-dependent mRNA cleavage at A-rich sites.

    PubMed

    Hurley, Jennifer M; Woychik, Nancy A

    2009-07-10

    Most pathogenic Proteus species are primarily associated with urinary tract infections, especially in persons with indwelling catheters or functional/anatomic abnormalities of the urinary tract. Urinary tract infections caused by Proteus vulgaris typically form biofilms and are resistant to commonly used antibiotics. The Rts1 conjugative plasmid from a clinical isolate of P. vulgaris carries over 300 predicted open reading frames, including antibiotic resistance genes. The maintenance of the Rts1 plasmid is ensured in part by the HigBA toxin-antitoxin system. We determined the precise mechanism of action of the HigB toxin in vivo, which is distinct from other known toxins. We demonstrate that HigB is an endoribonuclease whose enzymatic activity is dependent on association with ribosomes through the 50 S subunit. Using primer extension analysis of several test mRNAs, we showed that HigB cleaved extensively across the entire length of coding regions only at specific recognition sequences. HigB mediated cleavage of 100% of both in-frame and out-of-frame AAA sequences. In addition, HigB cleaved approximately 20% of AA sequences in coding regions and occasionally cut single As. Remarkably, the cleavage specificity of HigB coincided with one of the most frequently used codons in the AT-rich Proteus spp., AAA (lysine). Therefore, the HigB-mediated plasmid maintenance system for the Rts1 plasmid highlights the intimate relationship between host cells and extrachromosomal DNA that enables the dynamic acquisition of genes that impart a spectrum of survival advantages, including those encoding multidrug resistance and virulence factors.

  10. 3′ terminal diversity of MRP RNA and other human noncoding RNAs revealed by deep sequencing

    PubMed Central

    2013-01-01

    Background Post-transcriptional 3′ end processing is a key component of RNA regulation. The abundant and essential RNA subunit of RNase MRP has been proposed to function in three distinct cellular compartments and therefore may utilize this mode of regulation. Here we employ 3′ RACE coupled with high-throughput sequencing to characterize the 3′ terminal sequences of human MRP RNA and other noncoding RNAs that form RNP complexes. Results The 3′ terminal sequence of MRP RNA from HEK293T cells has a distinctive distribution of genomically encoded termini (including an assortment of U residues) with a portion of these selectively tagged by oligo(A) tails. This profile contrasts with the relatively homogenous 3′ terminus of an in vitro transcribed MRP RNA control and the differing 3′ terminal profiles of U3 snoRNA, RNase P RNA, and telomerase RNA (hTR). Conclusions 3′ RACE coupled with deep sequencing provides a valuable framework for the functional characterization of 3′ terminal sequences of noncoding RNAs. PMID:24053768

  11. miRBase: integrating microRNA annotation and deep-sequencing data.

    PubMed

    Kozomara, Ana; Griffiths-Jones, Sam

    2011-01-01

    miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15,000 microRNA gene loci in over 140 species, and over 17,000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.

  12. Evaluation of 7SL RNA gene sequences for the identification of Leishmania spp.

    PubMed

    Zelazny, Adrian M; Fedorko, Daniel P; Li, Li; Neva, Franklin A; Fischer, Steven H

    2005-04-01

    We evaluated the use of 7SL RNA gene sequences for the identification of Leishmania spp. A fragment (approximately 137 basepairs) of the 7SL RNA gene from 13 reference strains and 18 clinical isolates of 11 different Leishmania species was amplified and sequenced using conserved primers. Reference strains from each Leishmania spp. complex showed unique sequences. The nucleotide sequences were compared pairwise and a range of 81.0-99.3% intercomplex similarity was observed. Clinical isolates of the same species had sequences identical to the corresponding reference strains; thus, the intraspecies similarity was 100%. A phylogenetic tree derived from the 7SL RNA gene partial sequences was constructed and is in agreement with accepted phylogenetic schemes.

  13. Identification of Symmetrical RNA Editing Events in the Mitochondria of Salvia miltiorrhiza by Strand-specific RNA Sequencing

    PubMed Central

    Wu, Bin; Chen, Haimei; Shao, Junjie; Zhang, Hui; Wu, Kai; Liu, Chang

    2017-01-01

    Salvia miltiorrhiza is one of the most widely-used medicinal plants. Here, we systematically analyzed the RNA editing events in its mitochondria. We developed a pipeline using REDItools to predict RNA editing events from stand-specific RNA-Seq data. The predictions were validated using reverse transcription, RT-PCR amplification and Sanger sequencing experiments. Putative sequences motifs were characterized. Comparative analyses were carried out between S. miltiorrhiza, Arabidopsis thaliana and Oryza sativa. We discovered 1123 editing sites, including 225 “C to U” sites in the protein-coding regions. Fourteen of sixteen (87.5%) sites were validated. Three putative DNA motifs were identified around the predicted sites. The nucleotides on both strands at 115 of the 225 sites had undergone RNA editing, which we called symmetrical RNA editing (SRE). Four of six these SRE sites (66.7%) were experimentally confirmed. Re-examination of strand-specific RNA-Seq data from A. thaliana and O. sativa identified 327 and 369 SRE sites respectively. 78, 20 and 13 SRE sites were found to be conserved among A. thaliana, O. sativa and S. miltiorrhiza respectively. This study provides a comprehensive picture of RNA editing events in the mitochondrial genome of S. miltiorrhiza. We identified SREs for the first time, which may represent a universal phenomenon. PMID:28186130

  14. RNA sequencing of Sleeping Beauty transposon-induced tumors detects transposon-RNA fusions in forward genetic cancer screens

    PubMed Central

    Temiz, Nuri A.; Moriarity, Branden S.; Wolf, Natalie K.; Riordan, Jesse D.; Dupuy, Adam J.; Largaespada, David A.; Sarver, Aaron L.

    2016-01-01

    Forward genetic screens using Sleeping Beauty (SB)-mobilized T2/Onc transposons have been used to identify common insertion sites (CISs) associated with tumor formation. Recurrent sites of transposon insertion are commonly identified using ligation-mediated PCR (LM-PCR). Here, we use RNA sequencing (RNA-seq) data to directly identify transcriptional events mediated by T2/Onc. Surprisingly, the majority (∼80%) of LM-PCR identified junction fragments do not lead to observable changes in RNA transcripts. However, in CIS regions, direct transcriptional effects of transposon insertions are observed. We developed an automated method to systematically identify T2/Onc-genome RNA fusion sequences in RNA-seq data. RNA fusion-based CISs were identified corresponding to both DNA-based CISs (Cdkn2a, Mycl1, Nf2, Pten, Sema6d, and Rere) and additional regions strongly associated with cancer that were not observed by LM-PCR (Myc, Akt1, Pth, Csf1r, Fgfr2, Wisp1, Map3k5, and Map4k3). In addition to calculating recurrent CISs, we also present complementary methods to identify potential driver events via determination of strongly supported fusions and fusions with large transcript level changes in the absence of multitumor recurrence. These methods independently identify CIS regions and also point to cancer-associated genes like Braf. We anticipate RNA-seq analyses of tumors from forward genetic screens will become an efficient tool to identify causal events. PMID:26553456

  15. Identification of Symmetrical RNA Editing Events in the Mitochondria of Salvia miltiorrhiza by Strand-specific RNA Sequencing.

    PubMed

    Wu, Bin; Chen, Haimei; Shao, Junjie; Zhang, Hui; Wu, Kai; Liu, Chang

    2017-02-10

    Salvia miltiorrhiza is one of the most widely-used medicinal plants. Here, we systematically analyzed the RNA editing events in its mitochondria. We developed a pipeline using REDItools to predict RNA editing events from stand-specific RNA-Seq data. The predictions were validated using reverse transcription, RT-PCR amplification and Sanger sequencing experiments. Putative sequences motifs were characterized. Comparative analyses were carried out between S. miltiorrhiza, Arabidopsis thaliana and Oryza sativa. We discovered 1123 editing sites, including 225 "C to U" sites in the protein-coding regions. Fourteen of sixteen (87.5%) sites were validated. Three putative DNA motifs were identified around the predicted sites. The nucleotides on both strands at 115 of the 225 sites had undergone RNA editing, which we called symmetrical RNA editing (SRE). Four of six these SRE sites (66.7%) were experimentally confirmed. Re-examination of strand-specific RNA-Seq data from A. thaliana and O. sativa identified 327 and 369 SRE sites respectively. 78, 20 and 13 SRE sites were found to be conserved among A. thaliana, O. sativa and S. miltiorrhiza respectively. This study provides a comprehensive picture of RNA editing events in the mitochondrial genome of S. miltiorrhiza. We identified SREs for the first time, which may represent a universal phenomenon.

  16. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing.

    PubMed

    Ferreira, Pedro G; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R; Rivas, Manuel A; Esteve-Codina, Anna; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-12

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing-alternative splice sites, introns, and cleavage sites-which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  17. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    PubMed Central

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A.C.T; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-01-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts. PMID:27617755

  18. Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing

    NASA Astrophysics Data System (ADS)

    Ferreira, Pedro G.; Oti, Martin; Barann, Matthias; Wieland, Thomas; Ezquina, Suzana; Friedländer, Marc R.; Rivas, Manuel A.; Esteve-Codina, Anna; Estivill, Xavier; Guigó, Roderic; Dermitzakis, Emmanouil; Antonarakis, Stylianos; Meitinger, Thomas; Strom, Tim M.; Palotie, Aarno; François Deleuze, Jean; Sudbrak, Ralf; Lerach, Hans; Gut, Ivo; Syvänen, Ann-Christine; Gyllensten, Ulf; Schreiber, Stefan; Rosenstiel, Philip; Brunner, Han; Veltman, Joris; Hoen, Peter A. C. T.; Jan van Ommen, Gert; Carracedo, Angel; Brazma, Alvis; Flicek, Paul; Cambon-Thomsen, Anne; Mangion, Jonathan; Bentley, David; Hamosh, Ada; Rosenstiel, Philip; Strom, Tim M.; Lappalainen, Tuuli; Guigó, Roderic; Sammeth, Michael

    2016-09-01

    Recent advances in the cost-efficiency of sequencing technologies enabled the combined DNA- and RNA-sequencing of human individuals at the population-scale, making genome-wide investigations of the inter-individual genetic impact on gene expression viable. Employing mRNA-sequencing data from the Geuvadis Project and genome sequencing data from the 1000 Genomes Project we show that the computational analysis of DNA sequences around splice sites and poly-A signals is able to explain several observations in the phenotype data. In contrast to widespread assessments of statistically significant associations between DNA polymorphisms and quantitative traits, we developed a computational tool to pinpoint the molecular mechanisms by which genetic markers drive variation in RNA-processing, cataloguing and classifying alleles that change the affinity of core RNA elements to their recognizing factors. The in silico models we employ further suggest RNA editing can moonlight as a splicing-modulator, albeit less frequently than genomic sequence diversity. Beyond existing annotations, we demonstrate that the ultra-high resolution of RNA-Seq combined from 462 individuals also provides evidence for thousands of bona fide novel elements of RNA processing—alternative splice sites, introns, and cleavage sites—which are often rare and lowly expressed but in other characteristics similar to their annotated counterparts.

  19. Biclustering as a method for RNA local multiple sequence alignment.

    PubMed

    Wang, Shu; Gutell, Robin R; Miranker, Daniel P

    2007-12-15

    Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/

  20. MicroRNA Expression Profile in Penile Cancer Revealed by Next-Generation Small RNA Sequencing

    PubMed Central

    Zhang, Yuanwei; Xu, Bo; Zhou, Jun; Fan, Song; Hao, Zongyao; Shi, Haoqiang; Zhang, Xiansheng; Kong, Rui; Xu, Lingfan; Gao, Jingjing; Zou, Duohong; Liang, Chaozhao

    2015-01-01

    Penile cancer (PeCa) is a relatively rare tumor entity but possesses higher morbidity and mortality rates especially in developing countries. To date, the concrete pathogenic signaling pathways and core machineries involved in tumorigenesis and progression of PeCa remain to be elucidated. Several studies suggested miRNAs, which modulate gene expression at posttranscriptional level, were frequently mis-regulated and aberrantly expressed in human cancers. However, the miRNA profile in human PeCa has not been reported before. In this present study, the miRNA profile was obtained from 10 fresh penile cancerous tissues and matched adjacent non-cancerous tissues via next-generation sequencing. As a result, a total of 751 and 806 annotated miRNAs were identified in normal and cancerous penile tissues, respectively. Among which, 56 miRNAs with significantly different expression levels between paired tissues were identified. Subsequently, several annotated miRNAs were selected randomly and validated using quantitative real-time PCR. Compared with the previous publications regarding to the altered miRNAs expression in various cancers and especially genitourinary (prostate, bladder, kidney, testis) cancers, the most majority of deregulated miRNAs showed the similar expression pattern in penile cancer. Moreover, the bioinformatics analyses suggested that the putative target genes of differentially expressed miRNAs between cancerous and matched normal penile tissues were tightly associated with cell junction, proliferation, growth as well as genomic instability and so on, by modulating Wnt, MAPK, p53, PI3K-Akt, Notch and TGF-β signaling pathways, which were all well-established to participate in cancer initiation and progression. Our work presents a global view of the differentially expressed miRNAs and potentially regulatory networks of their target genes for clarifying the pathogenic transformation of normal penis to PeCa, which research resource also provides new insights

  1. MicroRNA Expression Profile in Penile Cancer Revealed by Next-Generation Small RNA Sequencing.

    PubMed

    Zhang, Li; Wei, Pengfei; Shen, Xudong; Zhang, Yuanwei; Xu, Bo; Zhou, Jun; Fan, Song; Hao, Zongyao; Shi, Haoqiang; Zhang, Xiansheng; Kong, Rui; Xu, Lingfan; Gao, Jingjing; Zou, Duohong; Liang, Chaozhao

    2015-01-01

    Penile cancer (PeCa) is a relatively rare tumor entity but possesses higher morbidity and mortality rates especially in developing countries. To date, the concrete pathogenic signaling pathways and core machineries involved in tumorigenesis and progression of PeCa remain to be elucidated. Several studies suggested miRNAs, which modulate gene expression at posttranscriptional level, were frequently mis-regulated and aberrantly expressed in human cancers. However, the miRNA profile in human PeCa has not been reported before. In this present study, the miRNA profile was obtained from 10 fresh penile cancerous tissues and matched adjacent non-cancerous tissues via next-generation sequencing. As a result, a total of 751 and 806 annotated miRNAs were identified in normal and cancerous penile tissues, respectively. Among which, 56 miRNAs with significantly different expression levels between paired tissues were identified. Subsequently, several annotated miRNAs were selected randomly and validated using quantitative real-time PCR. Compared with the previous publications regarding to the altered miRNAs expression in various cancers and especially genitourinary (prostate, bladder, kidney, testis) cancers, the most majority of deregulated miRNAs showed the similar expression pattern in penile cancer. Moreover, the bioinformatics analyses suggested that the putative target genes of differentially expressed miRNAs between cancerous and matched normal penile tissues were tightly associated with cell junction, proliferation, growth as well as genomic instability and so on, by modulating Wnt, MAPK, p53, PI3K-Akt, Notch and TGF-β signaling pathways, which were all well-established to participate in cancer initiation and progression. Our work presents a global view of the differentially expressed miRNAs and potentially regulatory networks of their target genes for clarifying the pathogenic transformation of normal penis to PeCa, which research resource also provides new insights

  2. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  3. Artificial small RNA for sequence specific cleavage of target RNA through RNase III endonuclease Dicer

    PubMed Central

    Liu, Yali; Liu, Li; Zhan, Yonghao; Zhuang, Chengle; Lin, Junhao; Chen, Mingwei; Li, Jianfa; Cai, Zhiming; Huang, Weiren; Zhang, Yong

    2016-01-01

    CRISPR-Cas9 system uses a guide RNA which functions in conjunction with Cas9 proteins to target a DNA and cleaves double-strand DNA. This phenomenon raises a question whether an artificial small RNA (asRNA), composed of a Dicer–binding RNA element and an antisense RNA, could also be used to induce Dicer to process and degrade a specific RNA. If so, we could develop a new method which is named DICERi for gene silencing or RNA editing. To prove the feasibility of asRNA, we selected MALAT-1 as target and used Hela and MDA-MB-231 cells as experimental models. The results of qRT-PCR showed that the introduction of asRNA decreased the relative expression level of target gene significantly. Next, we analyzed cell proliferation using CCK-8 and EdU staining assays, and then cell migration using wound scratch and Transwell invasion assays. We found that cell proliferation and cell migration were both suppressed remarkably after asRNA was expressed in Hela and MDA-MB-231 cells. Cell apoptosis was also detected through Hoechst staining and ELISA assays and the data indicated that he numbers of apoptotic cell in experimental groups significantly increased compared with negative controls. In order to prove that the gene silencing effects were caused by Dicer, we co-transfected shRNA silencing Dicer and asRNA. The relative expression levels of Dicer and MALAT-1 were both detected and the results indicated that when the cleavage role of Dicer was silenced, the relative expression level of MALAT-1 was not affected after the introduction of asRNA. All the above results demonstrated that these devices directed by Dicer effectively excised target RNA and repressed the target genes, thus causing phenotypic changes. Our works adds a new dimension to gene regulating technologies and may have broad applications in construction of gene circuits. PMID:27231846

  4. DISCOVERY OF A RICH CLUSTER AT z = 1.63 USING THE REST-FRAME 1.6 {mu}m 'STELLAR BUMP SEQUENCE' METHOD

    SciTech Connect

    Muzzin, Adam; Hoekstra, Henk; Wilson, Gillian; Demarco, Ricardo; Nantais, Julie; Lidman, Chris; Yee, H. K. C.; Rettura, Alessandro

    2013-04-10

    We present a new two-color algorithm, the 'Stellar Bump Sequence' (SBS), that is optimized for robustly identifying candidate high-redshift galaxy clusters in combined wide-field optical and mid-infrared (MIR) data. The SBS algorithm is a fusion of the well-tested cluster red-sequence method of Gladders and Yee with the MIR 3.6 {mu}m-4.5 {mu}m cluster detection method developed by Papovich. As with the cluster red-sequence method, the SBS identifies candidate overdensities within 3.6 {mu}m-4.5 {mu}m color slices, which are the equivalent of a rest-frame 1.6 {mu}m stellar bump 'red-sequence'. In addition to employing the MIR colors of galaxies, the SBS algorithm incorporates an optical/MIR (z'-3.6 {mu}m) color cut. This cut effectively eliminates foreground 0.2 1.0 galaxies and add noise when searching for high-redshift galaxy overdensities. We demonstrate using the z {approx} 1 GCLASS cluster sample that similar to the red sequence, the stellar bump sequence appears to be a ubiquitous feature of high-redshift clusters, and that within that sample the color of the stellar bump sequence increases monotonically with redshift and provides photometric redshifts accurate to {Delta}z = 0.05. We apply the SBS method in the XMM-LSS SWIRE field and show that it robustly recovers the majority of confirmed optical, MIR, and X-ray-selected clusters at z > 1.0 in that field. Lastly, we present confirmation of SpARCS J022427-032354 at z = 1.63, a new cluster detected with the method and confirmed with 12 high-confidence spectroscopic redshifts obtained using FORS2 on the Very Large Telescope. We conclude with a discussion of future prospects for using the algorithm.

  5. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  6. Phylogenetic analysis of oryx species using partial sequences of mitochondrial rRNA genes.

    PubMed

    Khan, H A; Arif, I A; Al Farhan, A H; Al Homaidan, A A

    2008-10-28

    We conducted a comparative evaluation of 12S rRNA and 16S rRNA genes of the mitochondrial genome for molecular differentiation among three oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) with respect to two closely related outgroups, addax and roan. Our findings showed the failure of 12S rRNA gene to differentiate between the genus Oryx and addax, whereas a 342-bp partial sequence of 16S rRNA accurately grouped all five taxa studied, suggesting the utility of 16S rRNA segment for molecular phylogeny of oryx at the genus and possibly species levels.

  7. The zinc fingers of YY1 bind single-stranded RNA with low sequence specificity.

    PubMed

    Wai, Dorothy C C; Shihab, Manar; Low, Jason K K; Mackay, Joel P

    2016-11-02

    Classical zinc fingers (ZFs) are traditionally considered to act as sequence-specific DNA-binding domains. More recently, classical ZFs have been recognised as potential RNA-binding modules, raising the intriguing possibility that classical-ZF transcription factors are involved in post-transcriptional gene regulation via direct RNA binding. To date, however, only one classical ZF-RNA complex, that involving TFIIIA, has been structurally characterised. Yin Yang-1 (YY1) is a multi-functional transcription factor involved in many regulatory processes, and binds DNA via four classical ZFs. Recent evidence suggests that YY1 also interacts with RNA, but the molecular nature of the interaction remains unknown. In the present work, we directly assess the ability of YY1 to bind RNA using in vitro assays. Systematic Evolution of Ligands by EXponential enrichment (SELEX) was used to identify preferred RNA sequences bound by the YY1 ZFs from a randomised library over multiple rounds of selection. However, a strong motif was not consistently recovered, suggesting that the RNA sequence selectivity of these domains is modest. YY1 ZF residues involved in binding to single-stranded RNA were identified by NMR spectroscopy and found to be largely distinct from the set of residues involved in DNA binding, suggesting that interactions between YY1 and ssRNA constitute a separate mode of nucleic acid binding. Our data are consistent with recent reports that YY1 can bind to RNA in a low-specificity, yet physiologically relevant manner.

  8. The zinc fingers of YY1 bind single-stranded RNA with low sequence specificity

    PubMed Central

    Wai, Dorothy C.C.; Shihab, Manar; Low, Jason K.K.; Mackay, Joel P.

    2016-01-01

    Classical zinc fingers (ZFs) are traditionally considered to act as sequence-specific DNA-binding domains. More recently, classical ZFs have been recognised as potential RNA-binding modules, raising the intriguing possibility that classical-ZF transcription factors are involved in post-transcriptional gene regulation via direct RNA binding. To date, however, only one classical ZF-RNA complex, that involving TFIIIA, has been structurally characterised. Yin Yang-1 (YY1) is a multi-functional transcription factor involved in many regulatory processes, and binds DNA via four classical ZFs. Recent evidence suggests that YY1 also interacts with RNA, but the molecular nature of the interaction remains unknown. In the present work, we directly assess the ability of YY1 to bind RNA using in vitro assays. Systematic Evolution of Ligands by EXponential enrichment (SELEX) was used to identify preferred RNA sequences bound by the YY1 ZFs from a randomised library over multiple rounds of selection. However, a strong motif was not consistently recovered, suggesting that the RNA sequence selectivity of these domains is modest. YY1 ZF residues involved in binding to single-stranded RNA were identified by NMR spectroscopy and found to be largely distinct from the set of residues involved in DNA binding, suggesting that interactions between YY1 and ssRNA constitute a separate mode of nucleic acid binding. Our data are consistent with recent reports that YY1 can bind to RNA in a low-specificity, yet physiologically relevant manner. PMID:27369384

  9. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

    PubMed Central

    Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

    2014-01-01

    RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209

  10. New perspectives on the diversification of the RNA interference system: insights from comparative genomics and small RNA sequencing

    PubMed Central

    Burroughs, Alexander Maxwell; Ando, Yoshinari; Aravind, L

    2014-01-01

    Our understanding of the pervasive involvement of small RNAs in regulating diverse biological processes has been greatly augmented by recent application of deep-sequencing technologies to small RNA across diverse eukaryotes. We review the currently-known small RNA classes and place them in context of the reconstructed evolutionary history of the RNAi protein machinery. This synthesis indicates the earliest versions of eukaryotic RNAi systems likely utilized small RNA processed from three types of precursors: 1) sense-antisense transcriptional products, 2) genome-encoded, imperfectly-complementary hairpin sequences, and 3) larger non-coding RNA precursor sequences. Structural dissection of PIWI proteins along with recent discovery of novel families (including Med13 of the Mediator complex) suggest that emergence of a distinct architecture with the N-terminal domains (also occurring separately fused to endoDNases in prokaryotes) formed via duplication of an ancestral unit was key to their recruitment as primary RNAi effectors and use of small RNAs of certain preferred lengths. Prokaryotic PIWI proteins are typically components of several RNA-directed DNA restriction or CRISPR/Cas systems. However, eukaryotic versions appear to have emerged from a subset that evolved RNA-directed RNA interference. They were recruited alongside RNaseIII domains and RdRP domains, also from prokaryotic systems, to form the core eukaryotic RNAi system. Like certain regulatory systems, RNAi diversified into two distinct but linked arms concomitant with eukaryotic nucleo-cytoplasmic compartmentalization. Subsequent elaboration of RNAi proceeded via diversification of the core protein machinery through lineage-specific expansions and recruitment of new components from prokaryotes (nucleases and small RNA-modifying enzymes), allowing for diversification of associating small RNAs. PMID:24311560

  11. RNA sequencing using fluorescent-labeled dideoxynucleotides and automated fluorescence detection.

    PubMed Central

    Bauer, G J

    1990-01-01

    Although dideoxy terminated sequencing of RNA, using reverse transcriptase and oligodeoxynucleotide primers, is now a well established method, the accuracy is limited by sequence ambiguities due to unspecific chain termination events. A protocol is described which circumvents these ambiguities by using fluorescence labels tagged to dideoxynucleotides. Only chain terminations caused by dideoxynucleotides were detected while premature terminated cDNA's remain undetectable. In addition, the remaining multiple signals at nucleotide positions can be assigned to sequence heterogeneities within the RNA sequence to be determined. Images PMID:1690393

  12. RNA sequence and transcriptional properties of the 3' end of the Newcastle disease virus genome

    SciTech Connect

    Kurilla, M.G.; Stone, H.O.; Keene, J.D.

    1985-09-01

    The 3' end of the genomic RNA of Newcastle disease virus (NDV) has been sequenced and the leader RNA defined. Using hybridization to a 3'-end-labeled genome, leader RNA species from in vitro transcription reactions and from infected cell extracts were found to be 47 and 53 nucleotides long. In addition, the start site of the 3'-proximal mRNA was determined by sequence analysis of in vitro (beta-32P)GTP-labeled transcription products. The genomic sequence extending beyond the leader region demonstrated an open reading frame for at least 42 amino acids and probably represents the amino terminus of the nucleocapsid protein (NP). The terminal 8 nucleotides of the NDV genome were identical to those of measles virus and Sendai virus while the sequence of the distal half of the leader region was more similar to that of vesicular stomatitis virus. These data argue for strong evolutionary relatedness between the paramyxovirus and rhabdovirus groups.

  13. Species Identification and Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences

    PubMed Central

    Lay, Christophe; Ho, Eliza Xin Pei; Low, Louie; Hibberd, Martin Lloyd; Nagarajan, Niranjan

    2013-01-01

    The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization. PMID:23579286

  14. Species identification and profiling of complex microbial communities using shotgun Illumina sequencing of 16S rRNA amplicon sequences.

    PubMed

    Ong, Swee Hoe; Kukkillaya, Vinutha Uppoor; Wilm, Andreas; Lay, Christophe; Ho, Eliza Xin Pei; Low, Louie; Hibberd, Martin Lloyd; Nagarajan, Niranjan

    2013-01-01

    The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing Illumina sequencing protocols are, however, limited by the fraction of the 16S rRNA gene that is interrogated and therefore limit the resolution and quality of the profiling. To address this, we present the design of a novel protocol for shotgun Illumina sequencing of the bacterial 16S rRNA gene, optimized to amplify more than 90% of sequences in the Greengenes database and with the ability to distinguish nearly twice as many species-level OTUs compared to existing protocols. Using several in silico and experimental datasets, we demonstrate that despite the presence of multiple variable and conserved regions, the resulting shotgun sequences can be used to accurately quantify the constituents of complex microbial communities. The reconstruction of a significant fraction of the 16S rRNA gene also enabled high precision (>90%) in species-level identification thereby opening up potential application of this approach for clinical microbial characterization.

  15. RNA Sequencing Identifies New RNase III Cleavage Sites in Escherichia coli and Reveals Increased Regulation of mRNA

    DOE PAGES

    Gordon, Gina C.; Cameron, Jeffrey C.; Pfleger, Brian F.

    2017-03-28

    Ribonucleases facilitate rapid turnover of RNA, providing cells with another mechanism to adjust transcript and protein levels in response to environmental conditions. While many examples have been documented, a comprehensive list of RNase targets is not available. To address this knowledge gap, we compared levels of RNA sequencing coverage of Escherichia coli and a corresponding RNase III mutant to expand the list of known RNase III targets. RNase III is a widespread endoribonuclease that binds and cleaves double-stranded RNA in many critical transcripts. RNase III cleavage at novel sites found in aceEF, proP, tnaC, dctA, pheM, sdhC, yhhQ, glpT, aceK,more » and gluQ accelerated RNA decay, consistent with previously described targets wherein RNase III cleavage initiates rapid degradation of secondary messages by other RNases. In contrast, cleavage at three novel sites in the ahpF, pflB, and yajQ transcripts led to stabilized secondary transcripts. Two other novel sites in hisL and pheM overlapped with transcriptional attenuators that likely serve to ensure turnover of these highly structured RNAs. Many of the new RNase III target sites are located on transcripts encoding metabolic enzymes. For instance, two novel RNase III sites are located within transcripts encoding enzymes near a key metabolic node connecting glycolysis and the tricarboxylic acid (TCA) cycle. Pyruvate dehydrogenase activity was increased in an rnc deletion mutant compared to the wild-type (WT) strain in early stationary phase, confirming the novel link between RNA turnover and regulation of pathway activity. Identification of these novel sites suggests that mRNA turnover may be an underappreciated mode of regulating metabolism. IMPORTANCE: The concerted action and overlapping functions of endoribonucleases, exoribonucleases, and RNA processing enzymes complicate the study of global RNA turnover and recycling of specific transcripts. More information about RNase specificity and activity is

  16. RNA Sequencing Identifies New RNase III Cleavage Sites in Escherichia coli and Reveals Increased Regulation of mRNA.

    PubMed

    Gordon, Gina C; Cameron, Jeffrey C; Pfleger, Brian F

    2017-03-28

    Ribonucleases facilitate rapid turnover of RNA, providing cells with another mechanism to adjust transcript and protein levels in response to environmental conditions. While many examples have been documented, a comprehensive list of RNase targets is not available. To address this knowledge gap, we compared levels of RNA sequencing coverage of Escherichia coli and a corresponding RNase III mutant to expand the list of known RNase III targets. RNase III is a widespread endoribonuclease that binds and cleaves double-stranded RNA in many critical transcripts. RNase III cleavage at novel sites found in aceEF, proP, tnaC, dctA, pheM, sdhC, yhhQ, glpT, aceK, and gluQ accelerated RNA decay, consistent with previously described targets wherein RNase III cleavage initiates rapid degradation of secondary messages by other RNases. In contrast, cleavage at three novel sites in the ahpF, pflB, and yajQ transcripts led to stabilized secondary transcripts. Two other novel sites in hisL and pheM overlapped with transcriptional attenuators that likely serve to ensure turnover of these highly structured RNAs. Many of the new RNase III target sites are located on transcripts encoding metabolic enzymes. For instance, two novel RNase III sites are located within transcripts encoding enzymes near a key metabolic node connecting glycolysis and the tricarboxylic acid (TCA) cycle. Pyruvate dehydrogenase activity was increased in an rnc deletion mutant compared to the wild-type (WT) strain in early stationary phase, confirming the novel link between RNA turnover and regulation of pathway activity. Identification of these novel sites suggests that mRNA turnover may be an underappreciated mode of regulating metabolism.IMPORTANCE The concerted action and overlapping functions of endoribonucleases, exoribonucleases, and RNA processing enzymes complicate the study of global RNA turnover and recycling of specific transcripts. More information about RNase specificity and activity is needed to make

  17. Method for rapid base sequencing in DNA and RNA with two base labeling

    DOEpatents

    Jett, James H.; Keller, Richard A.; Martin, John C.; Posner, Richard G.; Marrone, Babetta L.; Hammond, Mark L.; Simpson, Daniel J.

    1995-01-01

    Method for rapid-base sequencing in DNA and RNA with two-base labeling and employing fluorescent detection of single molecules at two wavelengths. Bases modified to accept fluorescent labels are used to replicate a single DNA or RNA strand to be sequenced. The bases are then sequentially cleaved from the replicated strand, excited with a chosen spectrum of electromagnetic radiation, and the fluorescence from individual, tagged bases detected in the order of cleavage from the strand.

  18. Complete nucleotide sequence of the 23S rRNA gene of the Cyanobacterium, Anacystis nidulans.

    PubMed Central

    Douglas, S E; Doolittle, W F

    1984-01-01

    The nucleotide sequence of the Anacystis nidulans 23S rRNA gene, including the 5'- and 3'-flanking regions has been determined. The gene is 2876 nucleotides long and shows higher primary sequence homology to the 23S rRNAs of plastids (84.5%) than to that of E. coli (79%). The predicted rRNA transcript also shares many secondary structural features with those of plastids, reinforcing the endosymbiont hypothesis for the origin of these organelles. PMID:6326060

  19. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    DTIC Science & Technology

    2016-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . PRINCIPAL INVESTIGATOR...SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer . 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-14-1-0080 GRANT11489...funded study of genetic and epigenetic alterations of pre-invasive DCIS that did or did not progress to invasive breast cancer , with an in-depth

  20. Cloud-scale RNA-sequencing differential expression analysis with Myrna

    PubMed Central

    2010-01-01

    As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna. PMID:20701754

  1. Method for rapid base sequencing in DNA and RNA with two base labeling

    DOEpatents

    Jett, J.H.; Keller, R.A.; Martin, J.C.; Posner, R.G.; Marrone, B.L.; Hammond, M.L.; Simpson, D.J.

    1995-04-11

    A method is described for rapid-base sequencing in DNA and RNA with two-base labeling and employing fluorescent detection of single molecules at two wavelengths. Bases modified to accept fluorescent labels are used to replicate a single DNA or RNA strand to be sequenced. The bases are then sequentially cleaved from the replicated strand, excited with a chosen spectrum of electromagnetic radiation, and the fluorescence from individual, tagged bases detected in the order of cleavage from the strand. 4 figures.

  2. Structural Requirement in Clostridium perfringens Collagenase mRNA 5′ Leader Sequence for Translational Induction through Small RNA-mRNA Base Pairing

    PubMed Central

    Nomura, Nobuhiko; Nakamura, Kouji

    2013-01-01

    The Gram-positive anaerobic bacterium Clostridium perfringens is pathogenic to humans and animals, and the production of its toxins is strictly regulated during the exponential phase. We recently found that the 5′ leader sequence of the colA transcript encoding collagenase, which is a major toxin of this organism, is processed and stabilized in the presence of the small RNA VR-RNA. The primary colA 5′-untranslated region (5′UTR) forms a long stem-loop structure containing an internal bulge and masks its own ribosomal binding site. Here we found that VR-RNA directly regulates colA expression through base pairing with colA mRNA in vivo. However, when the internal bulge structure was closed by point mutations in colA mRNA, translation ceased despite the presence of VR-RNA. In addition, a mutation disrupting the colA stem-loop structure induced mRNA processing and ColA-FLAG translational activation in the absence of VR-RNA, indicating that the stem-loop and internal bulge structure of the colA 5′ leader sequence is important for regulation by VR-RNA. On the other hand, processing was required for maximal ColA expression but was not essential for VR-RNA-dependent colA regulation. Finally, colA processing and translational activation were induced at a high temperature without VR-RNA. These results suggest that inhibition of the colA 5′ leader structure through base pairing is the primary role of VR-RNA in colA regulation and that the colA 5′ leader structure is a possible thermosensor. PMID:23585542

  3. Small RNA and RNA-IP Sequencing Identifies and Validates Novel MicroRNAs in Human Mesenchymal Stem Cells.

    PubMed

    Tsai, Chin-Han; Liao, Ko-Hsun; Shih, Chuan-Chi; Chan, Chia-Hao; Hsieh, Jui-Yu; Tsai, Cheng-Fong; Wang, Hsei-Wei; Chang, Shing-Jyh

    2016-03-01

    Organ regeneration therapies using multipotent mesenchymal stem cells (MSCs) are currently being investigated for a variety of common complex diseases. Understanding the molecular regulation of MSC biology will benefit regenerative medicine. MicroRNAs (miRNAs) act as regulators in MSC stemness. There are approximately 2500 currently known human miRNAs that have been recorded in the miRBase v21 database. In the present study, we identified novel microRNAs involved in MSC stemness and differentiation by obtaining the global microRNA expression profiles (miRNomes) of MSCs from two anatomical locations bone marrow (BM-MSCs) and umbilical cord Wharton's jelly (WJ-MSCs) and from osteogenically and adipogenically differentiated progenies of BM-MSCs. Small RNA sequencing (smRNA-seq) and bioinformatics analyses predicted that 49 uncharacterized miRNA candidates had high cellular expression values in MSCs. Another independent batch of Ago1/2-based RNA immunoprecipitation (RNA-IP) sequencing datasets validated the existence of 40 unreported miRNAs in cells and their associations with the RNA-induced silencing complex (RISC). Nine of these 40 new miRNAs were universally overexpressed in both MSC types; nine others were overexpressed in differentiated cells. A novel miRNA (UNI-118-3p) was specifically expressed in BM-MSCs, as verified using RT-qPCR. Taken together, this report offers comprehensive miRNome profiles for two MSC types, as well as cells differentiated from BM-MSCs. MSC transplantation has the potential to ameliorate degenerative disorders and repair damaged tissues. Interventions involving the above 40 new microRNA members in transplanted MSCs may potentially guide future clinical applications.

  4. Protein-mediated RNA folding governs sequence-specific interactions between rotavirus genome segments.

    PubMed

    Borodavka, Alexander; Dykeman, Eric C; Schrimpf, Waldemar; Lamb, Don C

    2017-09-18

    Segmented RNA viruses are ubiquitous pathogens, which include influenza viruses and rotaviruses. A major challenge in understanding their assembly is the combinatorial problem of a non-random selection of a full genomic set of distinct RNAs. This process involves complex RNA-RNA and protein-RNA interactions, which are often obscured by non-specific binding at concentrations approaching in vivo assembly conditions. Here, we present direct experimental evidence of sequence-specific inter-segment interactions between rotavirus RNAs, taking place in a complex RNA- and protein-rich milieu. We show that binding of the rotavirus-encoded non-structural protein NSP2 to viral ssRNAs results in the remodeling of RNA, which is conducive to formation of stable inter-segment contacts. To identify the sites of these interactions, we have developed an RNA-RNA SELEX approach for mapping the sequences involved in inter-segment base-pairing. Our findings elucidate the molecular basis underlying inter-segment interactions in rotaviruses, paving the way for delineating similar RNA-RNA interactions that govern assembly of other segmented RNA viruses.

  5. Production of Viral mRNA in Adenovirus-Transformed Cells by the Post-Transcriptional Processing of Heterogeneous Nuclear RNA Containing Viral and Cell Sequences

    PubMed Central

    Wall, R.; Weber, J.; Gage, Z.; Darnell, J. E.

    1973-01-01

    Adenovirus 2-transformed cells contain virus-specific sequences which are covalently linked to cell-specific RNA sequences in heterogeneous nuclear RNA (HnRNA) molecules larger than 45S. Virus sequences are identified by hybridization to viral DNA, and the cell sequences are detected by hybridization to cellular DNA under conditions where hybridization only occurs to reiterated sites in cell DNA. Such large composite viral-cell HnRNA molecules presumably arise through the uninterrupted transcription of host sequences and integrated viral DNA. Adenovirus-specific polysomal RNA from these cells sediments as three discrete species at 16, 20, and 26S. These specific classes of viral mRNA do not contain rapidly hybridizing host-specific RNA sequences. Both virus-specific HnRNA and mRNA contain polyadenylic acid sequences since they bind to polyU columns at levels characteristics of other polyA-terminated HnRNA and mRNA. Thus, the discrete species of virus-specific mRNA in adenovirus 2 transformed cells appear to be derived from high-molecular-weight virus-specific HnRNA through a series of post-transcriptional modifications involving polyA addition. Subsequently the HnRNA is cleaved so that the cell-specific RNA sequences that originate from the reiterated sites in cell DNA do not accompany the adenovirus mRNA to the cytoplasm. These events for the adenovirus-specific mRNA appear, therefore, to be similar to the stages in the biogenesis of the majority of mRNA in eukaryotic cells. PMID:4736534

  6. Quantitative Assessment of RNA-Protein Interactions with High Throughput Sequencing - RNA Affinity Profiling (HiTS-RAP)

    PubMed Central

    Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.

    2016-01-01

    Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240

  7. Human cellular CYBA UTR sequences increase mRNA translation without affecting the half-life of recombinant RNA transcripts.

    PubMed

    Ferizi, Mehrije; Aneja, Manish K; Balmayor, Elizabeth R; Badieyan, Zohreh Sadat; Mykhaylyk, Olga; Rudolph, Carsten; Plank, Christian

    2016-12-15

    Modified nucleotide chemistries that increase the half-life (T1/2) of transfected recombinant mRNA and the use of non-native 5'- and 3'-untranslated region (UTR) sequences that enhance protein translation are advancing the prospects of transcript therapy. To this end, a set of UTR sequences that are present in mRNAs with long cellular T1/2 were synthesized and cloned as five different recombinant sequence set combinations as upstream 5'-UTR and/or downstream 3'-UTR regions flanking a reporter gene. Initial screening in two different cell systems in vitro revealed that cytochrome b-245 alpha chain (CYBA) combinations performed the best among all other UTR combinations and were characterized in detail. The presence or absence of CYBA UTRs had no impact on the mRNA stability of transfected mRNAs, but appeared to enhance the productivity of transfected transcripts based on the measurement of mRNA and protein levels in cells. When CYBA UTRs were fused to human bone morphogenetic protein 2 (hBMP2) coding sequence, the recombinant mRNA transcripts upon transfection produced higher levels of protein as compared to control transcripts. Moreover, transfection of human adipose mesenchymal stem cells with recombinant hBMP2-CYBA UTR transcripts induced bone differentiation demonstrating the osteogenic and therapeutic potential for transcript therapy based on hybrid UTR designs.

  8. Human cellular CYBA UTR sequences increase mRNA translation without affecting the half-life of recombinant RNA transcripts

    PubMed Central

    Ferizi, Mehrije; Aneja, Manish K.; Balmayor, Elizabeth R.; Badieyan, Zohreh Sadat; Mykhaylyk, Olga; Rudolph, Carsten; Plank, Christian

    2016-01-01

    Modified nucleotide chemistries that increase the half-life (T1/2) of transfected recombinant mRNA and the use of non-native 5′- and 3′-untranslated region (UTR) sequences that enhance protein translation are advancing the prospects of transcript therapy. To this end, a set of UTR sequences that are present in mRNAs with long cellular T1/2 were synthesized and cloned as five different recombinant sequence set combinations as upstream 5′-UTR and/or downstream 3′-UTR regions flanking a reporter gene. Initial screening in two different cell systems in vitro revealed that cytochrome b-245 alpha chain (CYBA) combinations performed the best among all other UTR combinations and were characterized in detail. The presence or absence of CYBA UTRs had no impact on the mRNA stability of transfected mRNAs, but appeared to enhance the productivity of transfected transcripts based on the measurement of mRNA and protein levels in cells. When CYBA UTRs were fused to human bone morphogenetic protein 2 (hBMP2) coding sequence, the recombinant mRNA transcripts upon transfection produced higher levels of protein as compared to control transcripts. Moreover, transfection of human adipose mesenchymal stem cells with recombinant hBMP2-CYBA UTR transcripts induced bone differentiation demonstrating the osteogenic and therapeutic potential for transcript therapy based on hybrid UTR designs. PMID:27974853

  9. Oasis: online analysis of small RNA deep sequencing data.

    PubMed

    Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan

    2015-07-01

    Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  10. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers

    PubMed Central

    Liu, Zongzhi; DeSantis, Todd Z.; Andersen, Gary L.; Knight, Rob

    2008-01-01

    The recent introduction of massively parallel pyrosequencers allows rapid, inexpensive analysis of microbial community composition using 16S ribosomal RNA (rRNA) sequences. However, a major challenge is to design a workflow so that taxonomic information can be accurately and rapidly assigned to each read, so that the composition of each community can be linked back to likely ecological roles played by members of each species, genus, family or phylum. Here, we use three large 16S rRNA datasets to test whether taxonomic information based on the full-length sequences can be recaptured by short reads that simulate the pyrosequencer outputs. We find that different taxonomic assignment methods vary radically in their ability to recapture the taxonomic information in full-length 16S rRNA sequences: most methods are sensitive to the region of the 16S rRNA gene that is targeted for sequencing, but many combinations of methods and rRNA regions produce consistent and accurate results. To process large datasets of partial 16S rRNA sequences obtained from surveys of various microbial communities, including those from human body habitats, we recommend the use of Greengenes or RDP classifier with fragments of at least 250 bases, starting from one of the primers R357, R534, R798, F343 or F517. PMID:18723574

  11. Processing of Escherichia coli 16S rRNA with bacteriophage lambda leader sequences.

    PubMed Central

    Krych, M; Sirdeshmukh, R; Gourse, R; Schlessinger, D

    1987-01-01

    To test whether any specific 5' precursor sequences are required for the processing of pre-16S rRNA, constructs were studied in which large parts of the 5' leader sequence were replaced by the coliphage lambda pL promoter and adjacent sequences. Unexpectedly, few full-length transcripts of the rRNA were detected after the pL promoter was induced, implying that either transcription was poor or most of the rRNA chains with lambda leader sequences were unstable. Nevertheless, sufficient transcription occurred to permit the detection of processing by S1 nuclease analysis. RNA transcripts in which 2/3 of the normal rRNA leader was deleted (from the promoter up to the normal RNase III cleavage site) were processed to form the normal 5' terminus. Thus, most of the double-stranded stem that forms from sequences bracketing wild-type 16S pre-rRNA is apparently not required for proper processing; the expression of such modified transcripts, however, must be increased before the efficiency of processing of the 16S rRNA formed can be assessed. Images PMID:2445728

  12. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing.

    PubMed

    Bahn, Jae Hoon; Lee, Jae-Hyung; Li, Gang; Greer, Christopher; Peng, Guangdun; Xiao, Xinshu

    2012-01-01

    RNA editing enhances the diversity of gene products at the post-transcriptional level. Approaches for genome-wide identification of RNA editing face two main challenges: separating true editing sites from false discoveries and accurate estimation of editing levels. We developed an approach to analyze transcriptome sequencing data (RNA-seq) for global identification of RNA editing in cells for which whole-genome sequencing data are available. We applied the method to analyze RNA-seq data of a human glioblastoma cell line, U87MG. Around 10,000 DNA-RNA differences were identified, the majority being putative A-to-I editing sites. These predicted A-to-I events were associated with a low false-discovery rate (∼5%). Moreover, the estimated editing levels from RNA-seq correlated well with those based on traditional clonal sequencing. Our results further facilitated unbiased characterization of the sequence and evolutionary features flanking predicted A-to-I editing sites and discovery of a conserved RNA structural motif that may be functionally relevant to editing. Genes with predicted A-to-I editing were significantly enriched with those known to be involved in cancer, supporting the potential importance of cancer-specific RNA editing. A similar profile of DNA-RNA differences as in U87MG was predicted for another RNA-seq data set obtained from primary breast cancer samples. Remarkably, significant overlap exists between the putative editing sites of the two transcriptomes despite their difference in cell type, cancer type, and genomic backgrounds. Our approach enabled de novo identification of the RNA editome, which sets the stage for further mechanistic studies of this important step of post-transcriptional regulation.

  13. ARM-Seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments

    PubMed Central

    Cozen, Aaron E.; Quartley, Erin; Holmes, Andrew D.; Robinson, Eva H.; Phizicky, Eric M.; Lowe, Todd M.

    2015-01-01

    High throughput RNA sequencing has accelerated discovery of the complex regulatory roles of small RNAs, but RNAs containing modified nucleosides may escape detection when those modifications interfere with reverse transcription during RNA-seq library preparation. Here we describe AlkB-facilitated RNA Methylation sequencing (ARM-Seq) which uses pre-treatment with Escherichia coli AlkB to demethylate 1-methyladenosine, 3-methylcytidine, and 1-methylguanosine, all commonly found in transfer RNAs. Comparative methylation analysis using ARM-Seq provides the first detailed, transcriptome-scale map of these modifications, and reveals an abundance of previously undetected, methylated small RNAs derived from tRNAs. ARM-Seq demonstrates that tRNA-derived small RNAs accurately recapitulate the m1A modification state for well-characterized yeast tRNAs, and generates new predictions for a large number of human tRNAs, including tRNA precursors and mitochondrial tRNAs. Thus, ARM-Seq provides broad utility for identifying previously overlooked methyl-modified RNAs, can efficiently monitor methylation state, and may reveal new roles for tRNA-derived RNAs as biomarkers or signaling molecules. PMID:26237225

  14. Alterations of microRNA and microRNA-regulated messenger RNA expression in germinal center B-cell lymphomas determined by integrative sequencing analysis

    PubMed Central

    Hezaveh, Kebria; Kloetgen, Andreas; Bernhart, Stephan H; Mahapatra, Kunal Das; Lenze, Dido; Richter, Julia; Haake, Andrea; Bergmann, Anke K; Brors, Benedikt; Burkhardt, Birgit; Claviez, Alexander; Drexler, Hans G; Eils, Roland; Haas, Siegfried; Hoffmann, Steve; Karsch, Dennis; Klapper, Wolfram; Kleinheinz, Kortine; Korbel, Jan; Kretzmer, Helene; Kreuz, Markus; Küppers, Ralf; Lawerenz, Chris; Leich, Ellen; Loeffler, Markus; Mantovani-Loeffler, Luisa; López, Cristina; McHardy, Alice C; Möller, Peter; Rohde, Marius; Rosenstiel, Philip; Rosenwald, Andreas; Schilhabel, Markus; Schlesner, Matthias; Scholz, Ingrid; Stadler, Peter F; Stilgenbauer, Stephan; Sungalee, Stéphanie; Szczepanowski, Monika; Trümper, Lorenz; Weniger, Marc A; Siebert, Reiner; Borkhardt, Arndt; Hummel, Michael; Hoell, Jessica I.

    2016-01-01

    MicroRNA are well-established players in post-transcriptional gene regulation. However, information on the effects of microRNA deregulation mainly relies on bioinformatic prediction of potential targets, whereas proof of the direct physical microRNA/target messenger RNA interaction is mostly lacking. Within the International Cancer Genome Consortium Project “Determining Molecular Mechanisms in Malignant Lymphoma by Sequencing”, we performed miRnome sequencing from 16 Burkitt lymphomas, 19 diffuse large B-cell lymphomas, and 21 follicular lymphomas. Twenty-two miRNA separated Burkitt lymphomas from diffuse large B-cell lymphomas/follicular lymphomas, of which 13 have shown regulation by MYC. Moreover, we found expression of three hitherto unreported microRNA. Additionally, we detected recurrent mutations of hsa-miR-142 in diffuse large B-cell lymphomas and follicular lymphomas, and editing of the hsa-miR-376 cluster, providing evidence for microRNA editing in lymphomagenesis. To interrogate the direct physical interactions of microRNA with messenger RNA, we performed Argonaute-2 photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation experiments. MicroRNA directly targeted 208 messsenger RNA in the Burkitt lymphomas and 328 messenger RNA in the non-Burkitt lymphoma models. This integrative analysis discovered several regulatory pathways of relevance in lymphomagenesis including Ras, PI3K-Akt and MAPK signaling pathways, also recurrently deregulated in lymphomas by mutations. Our dataset reveals that messenger RNA deregulation through microRNA is a highly relevant mechanism in lymphomagenesis. PMID:27390358

  15. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    PubMed

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-07

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits.

  16. Research Techniques Made Simple: Bacterial 16S Ribosomal RNA Gene Sequencing in Cutaneous Research.

    PubMed

    Jo, Jay-Hyun; Kennedy, Elizabeth A; Kong, Heidi H

    2016-03-01

    Skin serves as a protective barrier and also harbors numerous microorganisms collectively comprising the skin microbiome. As a result of recent advances in sequencing (next-generation sequencing), our understanding of microbial communities on skin has advanced substantially. In particular, the 16S ribosomal RNA gene sequencing technique has played an important role in efforts to identify the global communities of bacteria in healthy individuals and patients with various disorders in multiple topographical regions over the skin surface. Here, we describe basic principles, study design, and a workflow of 16S ribosomal RNA gene sequencing methodology, primarily for investigators who are not familiar with this approach. This article will also discuss some applications and challenges of 16S ribosomal RNA sequencing as well as directions for future development.

  17. Short RNA duplexes guide sequence-dependent cleavage by human Dicer.

    PubMed

    Bergeron, Lucien; Perreault, Jean-Pierre; Abou Elela, Sherif

    2010-12-01

    Dicer is a member of the double-stranded (ds) RNA-specific ribonuclease III (RNase III) family that is required for RNA processing and degradation. Like most members of the RNase III family, Dicer possesses a dsRNA binding domain and cleaves long RNA duplexes in vitro. In this study, Dicer substrate selectivity was examined using bipartite substrates. These experiments revealed that an RNA helix possessing a 2-nucleotide (nt) 3'-overhang may bind and direct sequence-specific Dicer-mediated cleavage in trans at a fixed distance from the 3'-end overhang. Chemical modifications of the substrate indicate that the presence of the ribose 2'-hydroxyl group is not required for Dicer binding, but some located near the scissile bonds are needed for RNA cleavage. This suggests a flexible mechanism for substrate selectivity that recognizes the overall shape of an RNA helix. Examination of the structure of natural pre-microRNAs (pre-miRNAs) suggests that they may form bipartite substrates with complementary mRNA sequences, and thus induce seed-independent Dicer cleavage. Indeed, in vitro, natural pre-miRNA directed sequence-specific Dicer-mediated cleavage in trans by supporting the formation of a substrate mimic.

  18. Short RNA duplexes guide sequence-dependent cleavage by human Dicer

    PubMed Central

    Bergeron, Lucien Junior; Perreault, Jean-Pierre; Elela, Sherif Abou

    2010-01-01

    Dicer is a member of the double-stranded (ds) RNA-specific ribonuclease III (RNase III) family that is required for RNA processing and degradation. Like most members of the RNase III family, Dicer possesses a dsRNA binding domain and cleaves long RNA duplexes in vitro. In this study, Dicer substrate selectivity was examined using bipartite substrates. These experiments revealed that an RNA helix possessing a 2-nucleotide (nt) 3′-overhang may bind and direct sequence-specific Dicer-mediated cleavage in trans at a fixed distance from the 3′-end overhang. Chemical modifications of the substrate indicate that the presence of the ribose 2′-hydroxyl group is not required for Dicer binding, but some located near the scissile bonds are needed for RNA cleavage. This suggests a flexible mechanism for substrate selectivity that recognizes the overall shape of an RNA helix. Examination of the structure of natural pre-microRNAs (pre-miRNAs) suggests that they may form bipartite substrates with complementary mRNA sequences, and thus induce seed-independent Dicer cleavage. Indeed, in vitro, natural pre-miRNA directed sequence-specific Dicer-mediated cleavage in trans by supporting the formation of a substrate mimic. PMID:20974746

  19. Factor-independent transcription pausing caused by recognition of the RNA-DNA hybrid sequence.

    PubMed

    Bochkareva, Aleksandra; Yuzenkova, Yulia; Tadigotla, Vasisht R; Zenkin, Nikolay

    2012-02-01

    Pausing of transcription is an important step of regulation of gene expression in bacteria and eukaryotes. Here we uncover a factor-independent mechanism of transcription pausing, which is determined by the ability of the elongating RNA polymerase to recognize the sequence of the RNA-DNA hybrid. We show that, independently of thermodynamic stability of the elongation complex, RNA polymerase directly 'senses' the shape and/or identity of base pairs of the RNA-DNA hybrid. Recognition of the RNA-DNA hybrid sequence delays translocation by RNA polymerase, and thus slows down the nucleotide addition cycle through 'in pathway' mechanism. We show that this phenomenon is conserved among bacterial and eukaryotic RNA polymerases, and is involved in regulatory pauses, such as a pause regulating the production of virulence factors in some bacteria and a pause regulating transcription/replication of HIV-1. The results indicate that recognition of RNA-DNA hybrid sequence by multi-subunit RNA polymerases is involved in transcription regulation and may determine the overall rate of transcription elongation.

  20. RNA internal standard synthesis by nucleic acid sequence-based amplification for competitive quantitative amplification reactions.

    PubMed

    Lo, Wan-Yu; Baeumner, Antje J

    2007-02-15

    Nucleic acid sequence-based amplification (NASBA) reactions have been demonstrated to successfully synthesize new sequences based on deletion and insertion reactions. Two RNA internal standards were synthesized for use in competitive amplification reactions in which quantitative analysis can be achieved by coamplifying the internal standard with the wild type sample. The sequences were created in two consecutive NASBA reactions using the E. coli clpB mRNA sequence as model analyte. The primer sequences of the wild type sequence were maintained, and a 20-nt-long segment inside the amplicon region was exchanged for a new segment of similar GC content and melting temperature. The new RNA sequence was thus amplifiable using the wild type primers and detectable via a new inserted sequence. In the first reaction, the forwarding primer and an additional 20-nt-long sequence was deleted and replaced by a new 20-nt-long sequence. In the second reaction, a forwarding primer containing as 5' overhang sequence the wild type primer sequence was used. The presence of pure internal standard was verified using electrochemiluminescence and RNA lateral-flow biosensor analysis. Additional sequence deletion in order to shorten the internal standard amplicons and thus generate higher detection signals was found not to be required. Finally, a competitive NASBA reaction between one internal standard and the wild type sequence was carried out proving its functionality. This new rapid construction method via NASBA provides advantages over the traditional techniques since it requires no traditional cloning procedures, no thermocyclers, and can be completed in less than 4 h.

  1. The role of upstream sequences in selecting the reading frame on tmRNA

    PubMed Central

    Miller, Mickey R; Healey, David W; Robison, Stephen G; Dewey, Jonathan D; Buskirk, Allen R

    2008-01-01

    Background tmRNA acts first as a tRNA and then as an mRNA to rescue stalled ribosomes in eubacteria. Two unanswered questions about tmRNA function remain: how does tmRNA, lacking an anticodon, bypass the decoding machinery and enter the ribosome? Secondly, how does the ribosome choose the proper codon to resume translation on tmRNA? According to the -1 triplet hypothesis, the answer to both questions lies in the unique properties of the three nucleotides upstream of the first tmRNA codon. These nucleotides assume an A-form conformation that mimics the codon-anticodon interaction, leading to recognition by the decoding center and choice of the reading frame. The -1 triplet hypothesis is important because it is the most credible model in which direct binding and recognition by the ribosome sets the reading frame on tmRNA. Results Conformational analysis predicts that 18 triplets cannot form the correct structure to function as the -1 triplet of tmRNA. We tested the tmRNA activity of all possible -1 triplet mutants using a genetic assay in Escherichia coli. While many mutants displayed reduced activity, our findings do not match the predictions of this model. Additional mutagenesis identified sequences further upstream that are required for tmRNA function. An immunoblot assay for translation of the tmRNA tag revealed that certain mutations in U85, A86, and the -1 triplet sequence result in improper selection of the first codon and translation in the wrong frame (-1 or +1) in vivo. Conclusion Our findings disprove the -1 triplet hypothesis. The -1 triplet is not required for accommodation of tmRNA into the ribosome, although it plays a minor role in frame selection. Our results strongly disfavor direct ribosomal recognition of the upstream sequence, instead supporting a model in which the binding of a separate ligand to A86 is primarily responsible for frame selection. PMID:18590561

  2. Complete genome of Hainan papaya ringspot virus using small RNA deep sequencing.

    PubMed

    Zhang, Yuliang; Yu, Naitong; Huang, Qixing; Yin, Guohua; Guo, Anping; Wang, Xiangfeng; Xiong, Zhongguo; Liu, Zhixin

    2014-06-01

    Small RNA deep sequencing allows for virus identification, virus genome assembly, and strain differentiation. In this study, papaya plants with virus-like symptoms collected in Hainan province were used for deep sequencing and small RNA library construction. After in silicon subtraction of the papaya sRNAs, small RNA reads were used to in the viral genome assembly using a reference-guided, iterative assembly approach. A nearly complete genome was assembled for a Hainan isolate of papaya ringspot virus (PRSV-HN-2). The complete PRSV-HN-2 genome (accession no.: KF734962) was obtained after a 15-nucleotide gap was filled by direct sequencing of the amplified genomic region. Direct sequencing of several random genomic regions of the PRSV isolate did not find any sequence discrepancy with the sRNA-assembled genome. The newly sequenced PRSV-HN-2 genome shared a nucleotide identity of 96 and 94 % to that of the PRSV-HN (EF183499) and PRSV-HN-1 (HQ424465) isolates, and together with these two isolates formed a new PRSV clade. These data demonstrate that the small RNA deep sequencing technology provides a viable and rapid mean to assemble complete viral genomes in plants.

  3. Finding sRNA generative locales from high-throughput sequencing data with NiBLS

    PubMed Central

    2010-01-01

    Background Next-generation sequencing technologies allow researchers to obtain millions of sequence reads in a single experiment. One important use of the technology is the sequencing of small non-coding regulatory RNAs and the identification of the genomic locales from which they originate. Currently, there is a paucity of methods for finding small RNA generative locales. Results We describe and implement an algorithm that can determine small RNA generative locales from high-throughput sequencing data. The algorithm creates a network, or graph, of the small RNAs by creating links between them depending on their proximity on the target genome. For each of the sub-networks in the resulting graph the clustering coefficient, a measure of the interconnectedness of the subnetwork, is used to identify the generative locales. We test the algorithm over a wide range of parameters using RFAM sequences as positive controls and demonstrate that the algorithm has good sensitivity and specificity in a range of Arabidopsis and mouse small RNA sequence sets and that the locales it generates are robust to differences in the choice of parameters. Conclusions NiBLS is a fast, reliable and sensitive method for determining small RNA locales in high-throughput sequence data that is generally applicable to all classes of small RNA. PMID:20167070

  4. The complete nucleotide sequence of bean yellow mosaic potyvirus RNA.

    PubMed

    Guyatt, K J; Proll, D F; Menssen, A; Davidson, A D

    1996-01-01

    The complete nucleotide sequence of an Australian strain of bean yellow mosaic virus (BYMV-S) has been determined from cloned viral cDNAs. The BYMV-S genome is 9 547 nucleotides in length excluding a poly(A) tail. Computer analysis of the sequence revealed a single long open reading frame (ORF) of 9168 nucleotides, commencing at position 206 and terminating with UAG at position 9374-6. The ORF potentially encodes a polyprotein of 3056 amino acids with a deduced Mr of 347 409. The 5' and 3' untranslated regions are 205 and 174 nucleotides in length respectively. Alignment of the amino acid sequence of the BYMV-S polyprotein with those of other potyviruses identified nine putative proteolytic cleavage sites. The predicted consensus cleavage site of the BYMV NIa protease was found to differ from that described for other potyviruses. Processing of the BYMV polyprotein at the designated proteolytic cleavage sites would result in a typical potyviral genome arrangement. The amino acid sequences of the putative BYMV encoded proteins were compared to the homologous gene products of twelve individual potyviruses to identify overall and specific regions of amino acid sequence homology.

  5. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing

    PubMed Central

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0–9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community. PMID:25798135

  6. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures.

    PubMed

    Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric

    2011-11-01

    T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

  7. Complete Genome Sequence of a Double-Stranded RNA Virus from Avocado

    PubMed Central

    Villanueva, Francisco; Sabanadzovic, Sead; Valverde, Rodrigo A.

    2012-01-01

    A number of avocado (Persea americana) cultivars are known to contain high-molecular-weight double-stranded RNA (dsRNA) molecules for which a viral nature has been suggested, although sequence data are not available. Here we report the cloning and complete sequencing of a 13.5-kbp dsRNA virus isolated from avocado and show that it corresponds to the genome of a new species of the genus Endornavirus (family Endornaviridae), tentatively named Persea americana endornavirus (PaEV). PMID:22205720

  8. Nucleotide Sequence Analysis of RNA Synthesized from Rabbit Globin Complementary DNA

    PubMed Central

    Poon, Raymond; Paddock, Gary V.; Heindell, Howard; Whitcome, Philip; Salser, Winston; Kacian, Dan; Bank, Arthur; Gambino, Roberto; Ramirez, Francesco

    1974-01-01

    Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as template for in vitro synthesis of 32P-labeled RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. Several fragments were long enough to fit uniquely with the α or β globin amino-acid sequences. These data demonstrate that the cDNA was copied from globin mRNA and contained no detectable contaminants. Images PMID:4139714

  9. Modeling RNA Secondary Structure with Sequence Comparison and Experimental Mapping Data.

    PubMed

    Tan, Zhen; Sharma, Gaurav; Mathews, David H

    2017-07-25

    Secondary structure prediction is an important problem in RNA bioinformatics because knowledge of structure is critical to understanding the functions of RNA sequences. Significant improvements in prediction accuracy have recently been demonstrated though the incorporation of experimentally obtained structural information, for instance using selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) mapping. However, such mapping data is currently available only for a limited number of RNA sequences. In this article, we present a method for extending the benefit of experimental mapping data in secondary structure prediction to homologous sequences. Specifically, we propose a method for integrating experimental mapping data into a comparative sequence analysis algorithm for secondary structure prediction of multiple homologs, whereby the mapping data benefits not only the prediction for the specific sequence that was mapped but also other homologs. The proposed method is realized by modifying the TurboFold II algorithm for prediction of RNA secondary structures to utilize basepairing probabilities guided by SHAPE experimental data when such data are available. The SHAPE-mapping-guided basepairing probabilities are obtained using the RSample method. Results demonstrate that the SHAPE mapping data for a sequence improves structure prediction accuracy of other homologous sequences beyond the accuracy obtained by sequence comparison alone (TurboFold II). The updated version of TurboFold II is freely available as part of the RNAstructure software package. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  10. Molecular Diagnosis of Actinomadura madurae Infection by 16S rRNA Deep Sequencing

    PubMed Central

    SenGupta, Dhruba J.; Hoogestraat, Daniel R.; Cummings, Lisa A.; Bryant, Bronwyn H.; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W.; Chau, Mimosa; Barbee, Lindley A.; Rosenthal, Christopher; Cookson, Brad T.; Hoffman, Noah G.

    2013-01-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms. PMID:24108607

  11. Molecular diagnosis of Actinomadura madurae infection by 16S rRNA deep sequencing.

    PubMed

    Salipante, Stephen J; Sengupta, Dhruba J; Hoogestraat, Daniel R; Cummings, Lisa A; Bryant, Bronwyn H; Natividad, Catherine; Thielges, Stephanie; Monsaas, Peter W; Chau, Mimosa; Barbee, Lindley A; Rosenthal, Christopher; Cookson, Brad T; Hoffman, Noah G

    2013-12-01

    Next-generation DNA sequencing can be used to catalog individual organisms within complex, polymicrobial specimens. Here, we utilized deep sequencing of 16S rRNA to implicate Actinomadura madurae as the cause of mycetoma in a diabetic patient when culture and conventional molecular methods were overwhelmed by overgrowth of other organisms.

  12. RNA sequencing of the human milk fat layer transcriptome reveals distinct gene expression profiles at three stages of lactation.

    PubMed

    Lemay, Danielle G; Ballard, Olivia A; Hughes, Maria A; Morrow, Ardythe L; Horseman, Nelson D; Nommsen-Rivers, Laurie A

    2013-01-01

    Aware of the important benefits of human milk, most U.S. women initiate breastfeeding but difficulties with milk supply lead some to quit earlier than intended. Yet, the contribution of maternal physiology to lactation difficulties remains poorly understood. Human milk fat globules, by enveloping cell contents during their secretion into milk, are a rich source of mammary cell RNA. Here, we pair this non-invasive mRNA source with RNA-sequencing to probe the milk fat layer transcriptome during three stages of lactation: colostral, transitional, and mature milk production. The resulting transcriptomes paint an exquisite portrait of human lactation. The resulting transcriptional profiles cluster not by postpartum day, but by milk Na:K ratio, indicating that women sampled during similar postpartum time frames could be at markedly different stages of gene expression. Each stage of lactation is characterized by a dynamic range (10(5)-fold) in transcript abundances not previously observed with microarray technology. We discovered that transcripts for isoferritins and cathepsins are strikingly abundant during colostrum production, highlighting the potential importance of these proteins for neonatal health. Two transcripts, encoding β-casein (CSN2) and α-lactalbumin (LALBA), make up 45% of the total pool of mRNA in mature lactation. Genes significantly expressed across all stages of lactation are associated with making, modifying, transporting, and packaging milk proteins. Stage-specific transcripts are associated with immune defense during the colostral stage, up-regulation of the machinery needed for milk protein synthesis during the transitional stage, and the production of lipids during mature lactation. We observed strong modulation of key genes involved in lactose synthesis and insulin signaling. In particular, protein tyrosine phosphatase, receptor type, F (PTPRF) may serve as a biomarker linking insulin resistance with insufficient milk supply. This study provides

  13. RNA Sequencing of the Human Milk Fat Layer Transcriptome Reveals Distinct Gene Expression Profiles at Three Stages of Lactation

    PubMed Central

    Lemay, Danielle G.; Ballard, Olivia A.; Hughes, Maria A.; Morrow, Ardythe L.; Horseman, Nelson D.; Nommsen-Rivers, Laurie A.

    2013-01-01

    Aware of the important benefits of human milk, most U.S. women initiate breastfeeding but difficulties with milk supply lead some to quit earlier than intended. Yet, the contribution of maternal physiology to lactation difficulties remains poorly understood. Human milk fat globules, by enveloping cell contents during their secretion into milk, are a rich source of mammary cell RNA. Here, we pair this non-invasive mRNA source with RNA-sequencing to probe the milk fat layer transcriptome during three stages of lactation: colostral, transitional, and mature milk production. The resulting transcriptomes paint an exquisite portrait of human lactation. The resulting transcriptional profiles cluster not by postpartum day, but by milk Na:K ratio, indicating that women sampled during similar postpartum time frames could be at markedly different stages of gene expression. Each stage of lactation is characterized by a dynamic range (105-fold) in transcript abundances not previously observed with microarray technology. We discovered that transcripts for isoferritins and cathepsins are strikingly abundant during colostrum production, highlighting the potential importance of these proteins for neonatal health. Two transcripts, encoding β-casein (CSN2) and α-lactalbumin (LALBA), make up 45% of the total pool of mRNA in mature lactation. Genes significantly expressed across all stages of lactation are associated with making, modifying, transporting, and packaging milk proteins. Stage-specific transcripts are associated with immune defense during the colostral stage, up-regulation of the machinery needed for milk protein synthesis during the transitional stage, and the production of lipids during mature lactation. We observed strong modulation of key genes involved in lactose synthesis and insulin signaling. In particular, protein tyrosine phosphatase, receptor type, F (PTPRF) may serve as a biomarker linking insulin resistance with insufficient milk supply. This study provides

  14. Taxonomic Assessment of Rumen Microbiota Using Total RNA and Targeted Amplicon Sequencing Approaches

    PubMed Central

    Li, Fuyong; Henderson, Gemma; Sun, Xu; Cox, Faith; Janssen, Peter H.; Guan, Le Luo

    2016-01-01

    Taxonomic characterization of active gastrointestinal microbiota is essential to detect shifts in microbial communities and functions under various conditions. This study aimed to identify and quantify potentially active rumen microbiota using total RNA sequencing and to compare the outcomes of this approach with the widely used targeted RNA/DNA amplicon sequencing technique. Total RNA isolated from rumen digesta samples from five beef steers was subjected to Illumina paired-end sequencing (RNA-seq), and bacterial and archaeal amplicons of partial 16S rRNA/rDNA were subjected to 454 pyrosequencing (RNA/DNA Amplicon-seq). Taxonomic assessments of the RNA-seq, RNA Amplicon-seq, and DNA Amplicon-seq datasets were performed using a pipeline developed in house. The detected major microbial phylotypes were common among the three datasets, with seven bacterial phyla, fifteen bacterial families, and five archaeal taxa commonly identified across all datasets. There were also unique microbial taxa detected in each dataset. Elusimicrobia and Verrucomicrobia phyla; Desulfovibrionaceae, Elusimicrobiaceae, and Sphaerochaetaceae families; and Methanobrevibacter woesei were only detected in the RNA-Seq and RNA Amplicon-seq datasets, whereas Streptococcaceae was only detected in the DNA Amplicon-seq dataset. In addition, the relative abundances of four bacterial phyla, eight bacterial families and one archaeal taxon were different among the three datasets. This is the first study to compare the outcomes of rumen microbiota profiling between RNA-seq and RNA/DNA Amplicon-seq datasets. Our results illustrate the differences between these methods in characterizing microbiota both qualitatively and quantitatively for the same sample, and so caution must be exercised when comparing data. PMID:27446027

  15. A conditional random fields method for RNA sequence-structure relationship modeling and conformation sampling.

    PubMed

    Wang, Zhiyong; Xu, Jinbo

    2011-07-01

    Accurate tertiary structures are very important for the functional study of non-coding RNA molecules. However, predicting RNA tertiary structures is extremely challenging, because of a large conformation space to be explored and lack of an accurate scoring function differentiating the native structure from decoys. The fragment-based conformation sampling method (e.g. FARNA) bears shortcomings that the limited size of a fragment library makes it infeasible to represent all possible conformations well. A recent dynamic Bayesian network method, BARNACLE, overcomes the issue of fragment assembly. In addition, neither of these methods makes use of sequence information in sampling conformations. Here, we present a new probabilistic graphical model, conditional random fields (CRFs), to model RNA sequence-structure relationship, which enables us to accurately estimate the probability of an RNA conformation from sequence. Coupled with a novel tree-guided sampling scheme, our CRF model is then applied to RNA conformation sampling. Experimental results show that our CRF method can model RNA sequence-structure relationship well and sequence information is important for conformation sampling. Our method, named as TreeFolder, generates a much higher percentage of native-like decoys than FARNA and BARNACLE, although we use the same simple energy function as BARNACLE. zywang@ttic.edu; j3xu@ttic.edu Supplementary data are available at Bioinformatics online.

  16. Organization and nucleotide sequence analysis of a ribosomal RNA gene cluster from Streptomyces ambofaciens.

    PubMed

    Pernodet, J L; Boccard, F; Alegre, M T; Gagnat, J; Guérineau, M

    1989-06-30

    The Streptomyces ambofaciens genome contains four rRNA gene clusters. These copies are called rrnA, B, C and D. The complete nucleotide (nt) sequence of rrnD has been determined. These genes possess striking similarity with other eubacterial rRNA genes. Comparison with other rRNA sequences allowed the putative localization of the sequences encoding mature rRNAs. The structural genes are arranged in the order 16S-23S-5S and are tightly linked. The mature rRNAs are predicted to contain 1528, 3120 and 120 nt, for the 16S, 23S and 5S rRNAs, respectively. The 23S rRNA is, to our knowledge, the longest of all sequenced prokaryotic 23S rRNAs. When compared to other large rRNAs it shows insertions at positions where they are also present in archaebacterial and in eukaryotic large rRNAs. Secondary structure models of S. ambofaciens rRNAs are proposed, based upon those existing for other bacterial rRNAs. Positions of putative transcription start points and of a termination signal are suggested. The corresponding putative primary transcript, containing the 16S, 23S and 5S rRNAs plus flanking regions, was folded into a secondary structure, and sequences possibly involved in rRNA maturation are described. The G + C content of the rRNA gene cluster is low (57%) compared with the overall G + C content of Streptomyces DNA (73%).

  17. The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus.

    PubMed Central

    Gustafson, G; Armour, S L

    1986-01-01

    The complete nucleotide sequence of RNA beta from the type strain of barley stripe mosaic virus (BSMV) has been determined. The sequence is 3289 nucleotides in length and contains four open reading frames (ORFs) which code for proteins of Mr 22,147 (ORF1), Mr 58,098 (ORF2), Mr 17,378 (ORF3), and Mr 14,119 (ORF4). The predicted N-terminal amino acid sequence of the polypeptide encoded by the ORF nearest the 5'-end of the RNA (ORF1) is identical (after the initiator methionine) to the published N-terminal amino acid sequence of BSMV coat protein for 29 of the first 30 amino acids. ORF2 occupies the central portion of the coding region of RNA beta and ORF3 is located at the 3'-end. The ORF4 sequence overlaps the 3'-region of ORF2 and the 5'-region of ORF3 and differs in codon usage from the other three RNA beta ORFs. The coding region of RNA beta is followed by a poly(A) tract and a 238 nucleotide tRNA-like structure which are common to all three BSMV genomic RNAs. Images PMID:3754962

  18. 16S ribosomal RNA sequencing and molecular serotyping of Avibacterium paragallinarum isolated from Indian field conditions.

    PubMed

    Patil, Vihang Vithalrao; Mishra, Debendranath; Mane, Dilip Vithalrao

    2017-08-01

    This study was aimed at identifying Indian field isolates of Avibacterium paragallinarum on both molecular as well as serological levels that cause infectious coryza in chickens. Species-specific polymerase chain reaction (HPG-2 PCR), and 16S ribosomal RNA (rRNA) sequencing were employed for molecular identification. Whereas, multiplex PCR technique was used for serological identification of Indian field isolates of A. paragallinarum. All three field isolates were identified as A. paragallinarum using HPG-2 PCR. The species-specific PCR results were validated using 16S rRNA sequencing. The partial 16S rRNA sequences obtained from all three isolates showed 96-99% homology with the NCBI database reference strains of A. paragallinarum. The aligned partial sequences of 16S rRNA were submitted to GenBank, and accession numbers were obtained. Multiplex PCR-based molecular serotyping showed that there are three serotypes of field isolates of A. paragallinarum, namely, strain IND101 is serovar A, strain IND102 is serovar B, and strain IND103 is serovar C. HPG-2 PCR, 16S rRNA sequencing, and multiplex PCR are proved to be more accurate, sensitive, and reliable diagnostic tools for molecular and serological identification of A. paragallinarum field isolates. These diagnostic methods can substitute conventional cultural characterization and would be much valuable to formulate quick and correct prevention and control measures against this detrimental poultry pathogen.

  19. Nucleotide sequence of an exceptionally long 5.8S ribosomal RNA from Crithidia fasciculata.

    PubMed Central

    Schnare, M N; Gray, M W

    1982-01-01

    In Crithidia fasciculata, a trypanosomatid protozoan, the large ribosomal subunit contains five small RNA species (e, f, g, i, j) in addition to 5S rRNA [Gray, M.W. (1981) Mol. Cell. Biol. 1, 347-357]. The complete primary sequence of species i is shown here to be pAACGUGUmCGCGAUGGAUGACUUGGCUUCCUAUCUCGUUGA ... AGAmACGCAGUAAAGUGCGAUAAGUGGUApsiCAAUUGmCAGAAUCAUUCAAUUACCGAAUCUUUGAACGAAACGG ... CGCAUGGGAGAAGCUCUUUUGAGUCAUCCCCGUGCAUGCCAUAUUCUCCAmGUGUCGAA(C)OH. This sequence establishes that species i is a 5.8S rRNA, despite its exceptional length (171-172 nucleotides). The extra nucleotides in C. fasciculata 5.8S rRNA are located in a region whose primary sequence and length are highly variable among 5.8S rRNAs, but which is capable of forming a stable hairpin loop structure (the "G+C-rich hairpin"). The sequence of C. fasciculata 5.8S rRNA is no more closely related to that of another protozoan, Acanthamoeba castellanii, than it is to representative 5.8S rRNA sequences from the other eukaryotic kingdoms, emphasizing the deep phylogenetic divisions that seem to exist within the Kingdom Protista. Images PMID:7079176

  20. Sequence characterization of 5S ribosomal RNA from eight gram positive procaryotes

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Luehrsen, K. R.; Pribula, C. D.; Fox, G. E.

    1976-01-01

    Complete nucleotide sequences are presented for 5S rRNA from Bacillus subtilis, B. firmus, B. pasteurii, B. brevis, Lactobacillus brevis, and Streptococcus faecalis, and 5S rRNA oligonucleotide catalogs and partial sequence data are given for B. cereus and Sporosarcina ureae. These data demonstrate a striking consistency of 5S rRNA primary and secondary structure within a given bacterial grouping. An exception is B. brevis, in which the 5S rRNA sequence varies significantly from that of other bacilli in the tuned helix and the procaryotic loop. The localization of these variations suggests that B. brevis occupies an ecological niche that selects such changes. It is noted that this organism produces antibiotics which affect ribosome function.

  1. Sequence characterization of 5S ribosomal RNA from eight gram positive procaryotes

    NASA Technical Reports Server (NTRS)

    Woese, C. R.; Luehrsen, K. R.; Pribula, C. D.; Fox, G. E.

    1976-01-01

    Complete nucleotide sequences are presented for 5S rRNA from Bacillus subtilis, B. firmus, B. pasteurii, B. brevis, Lactobacillus brevis, and Streptococcus faecalis, and 5S rRNA oligonucleotide catalogs and partial sequence data are given for B. cereus and Sporosarcina ureae. These data demonstrate a striking consistency of 5S rRNA primary and secondary structure within a given bacterial grouping. An exception is B. brevis, in which the 5S rRNA sequence varies significantly from that of other bacilli in the tuned helix and the procaryotic loop. The localization of these variations suggests that B. brevis occupies an ecological niche that selects such changes. It is noted that this organism produces antibiotics which affect ribosome function.

  2. Towards next-generation sequencing analytics for foodborne RNA viruses: Examining the effect of RNA input quantity and viral RNA purity.

    PubMed

    Yang, Zhihui; Leonard, Susan R; Mammel, Mark K; Elkins, Christopher A; Kulka, Michael

    2016-10-01

    Detection and identification of viruses in food samples are technically challenging due largely to the low viral copy number in contaminated food items, and the lack of effective culture enrichment methods that are amenable to regulatory applications for many of the common foodborne viruses. Using an Illumina MiSeq platform and two hepatitis A virus (HAV) cell-culture adapted strains as a representative enteric virus species, this study examined the limits of single-stranded RNA (ssRNA) viral detection following next-generation sequencing without pre-amplification of the viral genome. Complete viral genome sequences were obtained from HAV samples of varying purities and with an input as low as 2ng total RNA containing 1.4×10(5) copies of viral RNA. In addition, single nucleotide variations were reproducibly detected over the range of concentrations examined, and their identity confirmed by alternate sequencing technology. In summary, next-generation sequencing technology has the potential for sensitive detection/identification of a viral genome at a low copy number. This study provides a benchmark for metagenomic sequencing application as is required for virus detection in complex food matrices using a culture-independent diagnostic approach.

  3. From Sequences to Shapes and Back: A Case Study in RNA Secondary Structures

    NASA Astrophysics Data System (ADS)

    Schuster, Peter; Fontana, Walter; Stadler, Peter F.; Hofacker, Ivo L.

    1994-03-01

    RNA folding is viewed here as a map assigning secondary structures to sequences. At fixed chain length the number of sequences far exceeds the number of structures. Frequencies of structures are highly non-uniform and follow a generalized form of Zipf's law: we find relatively few common and many rare ones. By using an algorithm for inverse folding, we show that sequences sharing the same structure are distributed randomly over sequence space. All common structures can be accessed from an arbitrary sequence by a number of mutations much smaller than the chain length. The sequence space is percolated by extensive neutral networks connecting nearest neighbours folding into identical structures. Implications for evolutionary adaptation and for applied molecular evolution are evident: finding a particular structure by mutation and selection is much simpler than expected and, even if catalytic activity should turn out to be sparse in the space of RNA structures, it can hardly be missed by evolutionary processes.

  4. Small RNA Profiling by Next-Generation Sequencing Using High-Definition Adapters.

    PubMed

    Billmeier, Martina; Xu, Ping

    2017-01-01

    Small RNAs (sRNAs) as key regulators of gene expression play fundamental roles in many biological processes. Next-generation sequencing (NGS) has become an important tool for sRNA discovery and profiling. However, NGS data often show bias for or against certain sequences which is mainly caused by adapter oligonucleotides that are ligated to sRNAs more or less efficiently by RNA ligases. In order to reduce ligation bias, High-definition (HD) adapters for the Illumina sequencing platform were developed. However, a large amount of direct 5' and 3' adapter ligation products are often produced when the current commercially available kits are used for cloning with HD adapters. In this chapter we describe a protocol for sRNA library construction using HD adapters with drastically reduced direct 5' adapter-3' adapter ligation product. The protocol can be used for sRNA library preparation from total RNA or sRNA of various plant, animal, insect, or fungal samples. The protocol includes total RNA extraction from plant leaf tissue and cultured mammalian cells and sRNA library construction using HD adapters.

  5. StarScan: a web server for scanning small RNA targets from degradome sequencing data.

    PubMed

    Liu, Shun; Li, Jun-Hao; Wu, Jie; Zhou, Ke-Ren; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2015-07-01

    Endogenous small non-coding RNAs (sRNAs), including microRNAs, PIWI-interacting RNAs and small interfering RNAs, play important gene regulatory roles in animals and plants by pairing to the protein-coding and non-coding transcripts. However, computationally assigning these various sRNAs to their regulatory target genes remains technically challenging. Recently, a high-throughput degradome sequencing method was applied to identify biologically relevant sRNA cleavage sites. In this study, an integrated web-based tool, StarScan (sRNA target Scan), was developed for scanning sRNA targets using degradome sequencing data from 20 species. Given a sRNA sequence from plants or animals, our web server performs an ultrafast and exhaustive search for potential sRNA-target interactions in annotated and unannotated genomic regions. The interactions between small RNAs and target transcripts were further evaluated using a novel tool, alignScore. A novel tool, degradomeBinomTest, was developed to quantify the abundance of degradome fragments located at the 9-11th nucleotide from the sRNA 5' end. This is the first web server for discovering potential sRNA-mediated RNA cleavage events in plants and animals, which affords mechanistic insights into the regulatory roles of sRNAs. The StarScan web server is available at http://mirlab.sysu.edu.cn/starscan/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. A plant viral coat protein RNA binding consensus sequence contains a crucial arginine.

    PubMed Central

    Ansel-McKinney, P; Scott, S W; Swanson, M; Ge, X; Gehrke, L

    1996-01-01

    A defining feature of alfalfa mosaic virus (AMV) and ilarviruses [type virus: tobacco streak virus (TSV)] is that, in addition to genomic RNAs, viral coat protein is required to establish infection in plants. AMV and TSV coat proteins, which share little primary amino acid sequence identity, are functionally interchangeable in RNA binding and initiation of infection. The lysine-rich amino-terminal RNA binding domain of the AMV coat protein lacks previously identified RNA binding motifs. Here, the AMV coat protein RNA binding domain is shown to contain a single arginine whose specific side chain and position are crucial for RNA binding. In addition, the putative RNA binding domain of two ilarvirus coat proteins, TSV and citrus variegation virus, is identified and also shown to contain a crucial arginine. AMV and ilarvirus coat protein sequence alignment centering on the key arginine revealed a new RNA binding consensus sequence. This consensus may explain in part why heterologous viral RNA-coat protein mixtures are infectious. Images PMID:8890181

  7. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  8. Sequence organization of the Acanthamoeba rRNA intergenic spacer: identification of transcriptional enhancers.

    PubMed Central

    Yang, Q; Zwick, M G; Paule, M R

    1994-01-01

    The primary sequence of the entire 2330 bp intergenic spacer of the A.castellanii ribosomal RNA gene was determined. Repeated sequence elements averaging 140 bp were identified and found to bind a protein required for optimum initiation at the core promoter. These repeated elements were shown to stimulate rRNA transcription by RNA polymerase I in vitro. The repeats inhibited transcription when placed in trans, and stimulated transcription when in cis, in either orientation, but only when upstream of the core promoter. Thus, these repeated elements have characteristics similar to polymerase I enhancers found in higher eukaryotes. The number of rRNA repeats in Acanthamoeba cells was determined to be 24 per haploid genome, the lowest number so far identified in any eukaryote. However, because Acanthamoeba is polyploid, each cell contains approximately 600 rRNA genes. Images PMID:7984432

  9. Small RNA Deep Sequencing and the Effects of microRNA408 on Root Gravitropic Bending in Arabidopsis

    NASA Astrophysics Data System (ADS)

    Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min

    2015-11-01

    MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.

  10. Prediction of RNA-binding proteins from primary sequence by a support vector machine approach

    PubMed Central

    HAN, LIAN YI; CAI, CONG ZHONG; LO, SIEW LIN; CHUNG, MAXEY C.M.; CHEN, YU ZONG

    2004-01-01

    Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein–protein interactions. But insufficient attention has been paid to the prediction of protein–RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein–RNA interactions. PMID:14970381

  11. The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data.

    PubMed

    Guo, Yan; Zhao, Shilin; Sheng, Quanhu; Samuels, David C; Shyr, Yu

    2017-10-03

    High throughput sequencing technology enables the both the human genome and transcriptome to be screened at the single nucleotide resolution. Tools have been developed to infer single nucleotide variants (SNVs) from both DNA and RNA sequencing data. To evaluate how much difference can be expected between DNA and RNA sequencing data, and among tissue sources, we designed a study to examine the single nucleotide difference among five sources of high throughput sequencing data generated from the same individual, including exome sequencing from blood, tumor and adjacent normal tissue, and RNAseq from tumor and adjacent normal tissue. Through careful quality control and analysis of the SNVs, we found little difference between DNA-DNA pairs (1%-2%). However, between DNA-RNA pairs, SNV differences ranged anywhere from 10% to 20%. Only a small portion of these differences can be explained by RNA editing. Instead, the majority of the DNA-RNA differences should be attributed to technical errors from sequencing and post-processing of RNAseq data. Our analysis results suggest that SNV detection using RNAseq is subject to high false positive rates.

  12. Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods

    PubMed Central

    Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara

    2017-01-01

    The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results. PMID:28588607

  13. Recombinant human MDM2 oncoprotein shows sequence composition selectivity for binding to both RNA and DNA.

    PubMed

    Challen, Christine; Anderson, John J; Chrzanowska-Lightowlers, Zofia M A; Lightowlers, Robert N; Lunec, John

    2012-03-01

    MDM2 is a 90 kDa nucleo-phosphoprotein that binds p53 and other proteins contributing to its oncogenic properties. Its structure includes an amino proximal p53 binding site, a central acidic domain and a carboxy region which incorporates Zinc and Ring Finger domains suggestive of nucleic acid binding or transcription factor function. It has previously been reported that a bacculovirus expressed MDM2 protein binds RNA in a sequence-specific manner through the Ring Finger domain, however, its ability to bind DNA has yet to be examined. We report here that a bacterially expressed human MDM2 protein binds both DNA as well as the previously defined RNA consensus sequence. DNA binding appears selective and involves the carboxy-terminal domain of the molecule. RNA binding is inhibited by an MDM2 specific antibody, which recognises an epitope within the carboxy region of the protein. Selection cloning and sequence analysis of MDM2 DNA binding sequences, unlike RNA binding sequences, revealed no obvious DNA binding consensus sequence, but preferential binding to oligopurine:pyrimidine-rich stretches. Our results suggest that the observed preferential DNA binding may occur through the Zinc Finger or in a charge-charge interaction through the Ring Finger, thereby implying potentially different mechanisms for DNA and RNA MDM2 binding.

  14. The RNA sequence context defines the mechanistic routes by which yeast arginyl-tRNA synthetase charges tRNA.

    PubMed Central

    Sissler, M; Giegé, R; Florentz, C

    1998-01-01

    Arginylation of tRNA transcripts by yeast arginyl-tRNA synthetase can be triggered by two alternate recognition sets in anticodon loops: C35 and U36 or G36 in tRNA(Arg) and C36 and G37 in tRNA(Asp) (Sissler M, Giegé R, Florentz C, 1996, EMBO J 15:5069-5076). Kinetic studies on tRNA variants were done to explore the mechanisms by which these sets are expressed. Although the synthetase interacts in a similar manner with tRNA(Arg) and tRNA(Asp), the details of the interaction patterns are idiosyncratic, especially in anticodon loops (Sissler M, Eriani G, Martin F, Giegé R, Florentz C, 1997, Nucleic Acids Res 25:4899-4906). Exchange of individual recognition elements between arginine and aspartate tRNA frameworks strongly blocks arginylation of the mutated tRNAs, whereas full exchange of the recognition sets leads to efficient arginine acceptance of the transplanted tRNAs. Unpredictably, the similar catalytic efficiencies of native and transplanted tRNAs originate from different k(cat) and Km combinations. A closer analysis reveals that efficient arginylation results from strong anticooperative effects between individual recognition elements. Nonrecognition nucleotides as well as the tRNA architecture are additional factors that tune efficiency. Altogether, arginyl-tRNA synthetase is able to utilize different context-dependent mechanistic routes to be activated. This confers biological advantages to the arginine aminoacylation system and sheds light on its evolutionary relationship with the aspartate system. PMID:9622124

  15. Identification of characteristic oligonucleotides in the bacterial 16S ribosomal RNA sequence dataset

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; Willson, Richard C.; Fox, George E.

    2002-01-01

    MOTIVATION: The phylogenetic structure of the bacterial world has been intensively studied by comparing sequences of 16S ribosomal RNA (16S rRNA). This database of sequences is now widely used to design probes for the detection of specific bacteria or groups of bacteria one at a time. The success of such methods reflects the fact that there are local sequence segments that are highly characteristic of particular organisms or groups of organisms. It is not clear, however, the extent to which such signature sequences exist in the 16S rRNA dataset. A better understanding of the numbers and distribution of highly informative oligonucleotide sequences may facilitate the design of hybridization arrays that can characterize the phylogenetic position of an unknown organism or serve as the basis for the development of novel approaches for use in bacterial identification. RESULTS: A computer-based algorithm that characterizes the extent to which any individual oligonucleotide sequence in 16S rRNA is characteristic of any particular bacterial grouping was developed. A measure of signature quality, Q(s), was formulated and subsequently calculated for every individual oligonucleotide sequence in the size range of 5-11 nucleotides and for 15mers with reference to each cluster and subcluster in a 929 organism representative phylogenetic tree. Subsequently, the perfect signature sequences were compared to the full set of 7322 sequences to see how common false positives were. The work completed here establishes beyond any doubt that highly characteristic oligonucleotides exist in the bacterial 16S rRNA sequence dataset in large numbers. Over 16,000 15mers were identified that might be useful as signatures. Signature oligonucleotides are available for over 80% of the nodes in the representative tree.

  16. Identification of characteristic oligonucleotides in the bacterial 16S ribosomal RNA sequence dataset

    NASA Technical Reports Server (NTRS)

    Zhang, Zhengdong; Willson, Richard C.; Fox, George E.

    2002-01-01

    MOTIVATION: The phylogenetic structure of the bacterial world has been intensively studied by comparing sequences of 16S ribosomal RNA (16S rRNA). This database of sequences is now widely used to design probes for the detection of specific bacteria or groups of bacteria one at a time. The success of such methods reflects the fact that there are local sequence segments that are highly characteristic of particular organisms or groups of organisms. It is not clear, however, the extent to which such signature sequences exist in the 16S rRNA dataset. A better understanding of the numbers and distribution of highly informative oligonucleotide sequences may facilitate the design of hybridization arrays that can characterize the phylogenetic position of an unknown organism or serve as the basis for the development of novel approaches for use in bacterial identification. RESULTS: A computer-based algorithm that characterizes the extent to which any individual oligonucleotide sequence in 16S rRNA is characteristic of any particular bacterial grouping was developed. A measure of signature quality, Q(s), was formulated and subsequently calculated for every individual oligonucleotide sequence in the size range of 5-11 nucleotides and for 15mers with reference to each cluster and subcluster in a 929 organism representative phylogenetic tree. Subsequently, the perfect signature sequences were compared to the full set of 7322 sequences to see how common false positives were. The work completed here establishes beyond any doubt that highly characteristic oligonucleotides exist in the bacterial 16S rRNA sequence dataset in large numbers. Over 16,000 15mers were identified that might be useful as signatures. Signature oligonucleotides are available for over 80% of the nodes in the representative tree.

  17. Identification of conserved and novel microRNAs in Aquilaria sinensis based on small RNA sequencing and transcriptome sequence data.

    PubMed

    Gao, Zhi-Hui; Wei, Jian-He; Yang, Yun; Zhang, Zheng; Xiong, Huan-Ying; Zhao, Wen-Ting

    2012-08-15

    Agarwood is in great demand for its high value in medicine, incense, and perfume across Asia, Middle East, and Europe. As agarwood is formed only when the Aquilaria trees are wounded or infected by some microbes, overharvesting and habitat loss are threatening some populations of agarwood-producing species. Aquilaria sinensis is such a significant economic tree species. To promote the production efficiency and protect the resource of A. sinensis, it would be critical to reveal the regulation mechanisms of stress-induced agarwood formation. MicroRNAs (miRNAs), a key gene expression regulator involved in various plant stress response and metabolic processes, might function in agarwood formation, but no report concerning miRNAs in Aquilaria is available. In this study, the small RNA high-throughput sequencing and 454 transcriptome data were adopted to identify both conserved and novel miRNAs in A. sinensis. Deep sequencing showed that the small RNA (sRNA) population of A. sinensis was complex and the length of sRNAs varied. By in silico analysis of the small RNA deep sequencing data and transcriptome data, we discovered 27 novel miRNAs in A. sinensis. Based on the mature miRNA sequence conservation, we identified 74 putative conserved miRNAs from A. sinensis and 10 of them were confirmed with hairpin forming precursor. Interestingly, a novel miRNA sequence was determined to be the miRNA of asi-miR408, but with accumulation much higher than asi-miR408. The expression levels of ten stress-responsive miRNAs were examined during the time-course after wound treatment. Eight were shown to be wound-responsive. This not only shows the existence of miRNAs in this Asian economically significant tree species but also indicated its critical role in stress-induced agarwood formation. The highly accumulated miRNA of asi-miR408 implied miRNAs would be functional as well as miRNAs in plants.

  18. Sequences locating the 5' ends of the major simian virus 40 late mRNA forms.

    PubMed Central

    Piatak, M; Ghosh, P K; Norkin, L C; Weissman, S M

    1983-01-01

    The 5' sequences of late mRNA specified by several constructed or naturally occurring deletion or duplication mutants of simian virus 40 were examined. The mutants included viruses with various small deletions centered about 25 nucleotides upstream from the major transcription initiation site, as well as viruses containing tandem duplications of a sequence of 50 nucleotides or less embedding the major transcription initiation site. The results show that the sequences 25 to 30 nucleotides upstream from the major initiation site in the position of the TATA box of other polymerase II promoters are not essential for the precise localization of the initiation site of late mRNA. Rather, we deduce that the major late mRNA start site is determined primarily by sequences located very close to the initiation site, and that the relative abundance of the 5' ends with this initiation site is modulated by nearby downstream sequences. Modification of six nucleotides adjacent upstream to the initiation site almost completely prevents the utilization of this site. Various deletions and substitutions of sequences 21 nucleotides or more downstream from the major initiation site causes upstream shifts in the localization of the most abundantly utilized 5' ends. The sequences immediately downstream from the major simian virus 40 initiation sites contain inverted symmetries that could give rise to secondary structures in either single-stranded DNA or RNA; the possibility that these inverted symmetries function in transcription initiation at the level of DNA structure rather than in RNA stabilization is discussed. Finally, we present additional evidence that precursor species with certain 5' termini are selectively spliced to form 19S RNA, whereas other 5' termini are preferred for forming the 16S RNA splice. We discuss the possibility that this is a consequence of the influence of leader structure on downstream splicing events. Images PMID:6194314

  19. RNA sequencing of Sleeping Beauty transposon-induced tumors detects transposon-RNA fusions in forward genetic cancer screens.

    PubMed

    Temiz, Nuri A; Moriarity, Branden S; Wolf, Natalie K; Riordan, Jesse D; Dupuy, Adam J; Largaespada, David A; Sarver, Aaron L

    2016-01-01

    Forward genetic screens using Sleeping Beauty (SB)-mobilized T2/Onc transposons have been used to identify common insertion sites (CISs) associated with tumor formation. Recurrent sites of transposon insertion are commonly identified using ligation-mediated PCR (LM-PCR). Here, we use RNA sequencing (RNA-seq) data to directly identify transcriptional events mediated by T2/Onc. Surprisingly, the majority (∼80%) of LM-PCR identified junction fragments do not lead to observable changes in RNA transcripts. However, in CIS regions, direct transcriptional effects of transposon insertions are observed. We developed an automated method to systematically identify T2/Onc-genome RNA fusion sequences in RNA-seq data. RNA fusion-based CISs were identified corresponding to both DNA-based CISs (Cdkn2a, Mycl1, Nf2, Pten, Sema6d, and Rere) and additional regions strongly associated with cancer that were not observed by LM-PCR (Myc, Akt1, Pth, Csf1r, Fgfr2, Wisp1, Map3k5, and Map4k3). In addition to calculating recurrent CISs, we also present complementary methods to identify potential driver events via determination of strongly supported fusions and fusions with large transcript level changes in the absence of multitumor recurrence. These methods independently identify CIS regions and also point to cancer-associated genes like Braf. We anticipate RNA-seq analyses of tumors from forward genetic screens will become an efficient tool to identify causal events. © 2016 Temiz et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Accuracy of RNA-Seq and its dependence on sequencing depth

    PubMed Central

    2012-01-01

    Background The cost of DNA sequencing has undergone a dramatical reduction in the past decade. As a result, sequencing technologies have been increasingly applied to genomic research. RNA-Seq is becoming a common technique for surveying gene expression based on DNA sequencing. As it is not clear how increased sequencing capacity has affected measurement accuracy of mRNA, we sought to investigate that relationship. Result We empirically evaluate the accuracy of repeated gene expression measurements using RNA-Seq. We identify library preparation steps prior to DNA sequencing as the main source of error in this process. Studying three datasets, we show that the accuracy indeed improves with the sequencing depth. However, the rate of improvement as a function of sequence reads is generally slower than predicted by the binomial distribution. We therefore used the beta-binomial distribution to model the overdispersion. The overdispersion parameters we introduced depend explicitly on the number of reads so that the resulting statistical uncertainty is consistent with the empirical data that measurement accuracy increases with the sequencing depth. The overdispersion parameters were determined by maximizing the likelihood. We shown that our modified beta-binomial model had lower false discovery rate than the binomial or the pure beta-binomial models. Conclusion We proposed a novel form of overdispersion guaranteeing that the accuracy improves with sequencing depth. We demonstrated that the new form provides a better fit to the data. PMID:23320920

  1. The phylogenetic utility and functional constraint of microRNA flanking sequences

    PubMed Central

    Kenny, Nathan J.; Sin, Yung Wa; Hayward, Alexander; Paps, Jordi; Chu, Ka Hou; Hui, Jerome H. L.

    2015-01-01

    MicroRNAs (miRNAs) have recently risen to prominence as novel factors responsible for post-transcriptional regulation of gene expression. miRNA genes have been posited as highly conserved in the clades in which they exist. Consequently, miRNAs have been used as rare genome change characters to estimate phylogeny by tracking their gain and loss. However, their short length (21–23 bp) has limited their perceived utility in sequenced-based phylogenetic inference. Here, using reference taxa with established phylogenetic relationships, we demonstrate that miRNA sequences are of high utility in quantitative, rather than in qualitative, phylogenetic analysis. The clear orthology among miRNA genes from different species makes it straightforward to identify and align these sequences from even fragmentary datasets. We also identify significant sequence conservation in the regions directly flanking miRNA genes, and show that this too is of utility in phylogenetic analysis, as well as highlighting conserved regions that will be of interest to other fields. Employing miRNA sequences from 12 sequenced drosophilid genomes, together with a Tribolium castaneum outgroup, we demonstrate that this approach is robust using Bayesian and maximum-likelihood methods. The utility of these characters is further demonstrated in the rhabditid nematodes and primates. As next-generation sequencing makes it more cost-effective to sequence genomes and small RNA libraries, this methodology provides an alternative data source for phylogenetic analysis. The approach allows rapid resolution of relationships between both closely related and rapidly evolving species, and provides an additional tool for investigation of relationships within the tree of life. PMID:25694624

  2. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    PubMed

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  3. Nodavirus Coat Protein Imposes Dodecahedral RNA Structure Independent of Nucleotide Sequence and Length†

    PubMed Central

    Tihova, Mariana; Dryden, Kelly A.; Le, Thuc-vy L.; Harvey, Stephen C.; Johnson, John E.; Yeager, Mark; Schneemann, Anette

    2004-01-01

    The nodavirus Flock house virus (FHV) has a bipartite, positive-sense RNA genome that is packaged into an icosahedral particle displaying T=3 symmetry. The high-resolution X-ray structure of FHV has shown that 10 bp of well-ordered, double-stranded RNA are located at each of the 30 twofold axes of the virion, but it is not known which portions of the genome form these duplex regions. The regular distribution of double-stranded RNA in the interior of the virus particle indicates that large regions of the encapsidated genome are engaged in secondary structure interactions. Moreover, the RNA is restricted to a topology that is unlikely to exist during translation or replication. We used electron cryomicroscopy and image reconstruction to determine the structure of four types of FHV particles that differed in RNA and protein content. RNA-capsid interactions were primarily mediated via the N and C termini, which are essential for RNA recognition and particle assembly. A substantial fraction of the packaged nucleic acid, either viral or heterologous, was organized as a dodecahedral cage of duplex RNA. The similarity in tertiary structure suggests that RNA folding is independent of sequence and length. Computational modeling indicated that RNA duplex formation involves both short-range and long-range interactions. We propose that the capsid protein is able to exploit the plasticity of the RNA secondary structures, capturing those that are compatible with the geometry of the dodecahedral cage. PMID:14990708

  4. Integrative analyses of RNA editing, alternative splicing, and expression of young genes in human brain transcriptome by deep RNA sequencing.

    PubMed

    Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping

    2015-08-01

    Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution. © The Author (2015). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, IBCB, SIBS, CAS. All rights reserved.

  5. The 3' sequences required for incorporation of an engineered ssRNA into the Reovirus genome

    PubMed Central

    Roner, Michael R; Roehr, Joanne

    2006-01-01

    Background Understanding how an organism replicates and assembles a multi-segmented genome with fidelity previously measured at 100% presents a model system for exploring questions involving genome assortment and RNA/protein interactions in general. The virus family Reoviridae, containing nine genera and more than 200 members, are unique in that they possess a segmented double-stranded (ds) RNA genome. Using reovirus as a model member of this family, we have developed the only functional reverse genetics system for a member of this family with ten or more genome segments. Using this system, we have previously identified the flanking 5' sequences required by an engineered s2 ssRNA for efficient incorporation into the genome of reovirus. The minimum 5' sequence retains 96 nucleotides and contains a predicted sequence/structure element. Within these 96 nucleotides, we have identified three nucleotides A-U-U at positions 79–81 that are essential for the incorporation of in vitro generated ssRNAs into new reovirus progeny viral particles. The work presented here builds on these findings and presents the results of an analysis of the required 3' flanking sequences of the s2 ssRNA. Results The minimum 3' sequence we localized retains 98 nucleotides of the wild type s2 ssRNA. These sequences do not interact with the 5' sequences and modifications of the 5' sequences does not result in a change in the sequences required at the 3' end of the engineered s2 ssRNA. Within the 3' sequence we discovered three regions that when mutated prevent the ssRNA from being replicated to dsRNA and subsequently incorporated into progeny virions. Using a series of substitutions we were able to obtain additional information about the sequences in these regions. We demonstrate that the individual nucleotides from, 98 to 84, 68 to 59, and 28 to 1, are required in addition to the total length of 98 nucleotides to direct an engineered reovirus ssRNA to be replicated to dsRNA and incorporated

  6. Sequence walkers: a graphical method to display how binding proteins interact with DNA or RNA sequences.

    PubMed Central

    Schneider, T D

    1997-01-01

    A graphical method is presented for displaying how binding proteins and other macromolecules interact with individual bases of nucleotide sequences. Characters representing the sequence are either oriented normally and placed above a line indicating favorable contact, or upside-down and placed below the line indicating unfavorable contact. The positive or negative height of each letter shows the contribution of that base to the average sequence conservation of the binding site, as represented by a sequence logo. These sequence 'walkers' can be stepped along raw sequence data to visually search for binding sites. Many walkers, for the same or different proteins, can be simultaneously placed next to a sequence to create a quantitative map of a complex genetic region. One can alter the sequence to quantitatively engineer binding sites. Database anomalies can be visualized by placing a walker at the recorded positions of a binding molecule and by comparing this to locations found by scanning the nearby sequences. The sequence can also be altered to predict whether a change is a polymorphism or a mutation for the recognizer being modeled. PMID:9336476

  7. CPSS: a computational platform for the analysis of small RNA deep sequencing data.

    PubMed

    Zhang, Yuanwei; Xu, Bo; Yang, Yifan; Ban, Rongjun; Zhang, Huan; Jiang, Xiaohua; Cooke, Howard J; Xue, Yu; Shi, Qinghua

    2012-07-15

    Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.

  8. StarScan: a web server for scanning small RNA targets from degradome sequencing data

    PubMed Central

    Liu, Shun; Li, Jun-Hao; Wu, Jie; Zhou, Ke-Ren; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

    2015-01-01

    Endogenous small non-coding RNAs (sRNAs), including microRNAs, PIWI-interacting RNAs and small interfering RNAs, play important gene regulatory roles in animals and plants by pairing to the protein-coding and non-coding transcripts. However, computationally assigning these various sRNAs to their regulatory target genes remains technically challenging. Recently, a high-throughput degradome sequencing method was applied to identify biologically relevant sRNA cleavage sites. In this study, an integrated web-based tool, StarScan (sRNA target Scan), was developed for scanning sRNA targets using degradome sequencing data from 20 species. Given a sRNA sequence from plants or animals, our web server performs an ultrafast and exhaustive search for potential sRNA–target interactions in annotated and unannotated genomic regions. The interactions between small RNAs and target transcripts were further evaluated using a novel tool, alignScore. A novel tool, degradomeBinomTest, was developed to quantify the abundance of degradome fragments located at the 9–11th nucleotide from the sRNA 5′ end. This is the first web server for discovering potential sRNA-mediated RNA cleavage events in plants and animals, which affords mechanistic insights into the regulatory roles of sRNAs. The StarScan web server is available at http://mirlab.sysu.edu.cn/starscan/. PMID:25990732

  9. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

    PubMed

    Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

    2015-07-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa.

  10. [16S rRNA gene sequence analysis for bacterial identification in the clinical laboratory].

    PubMed

    Matsumoto, Takehisa; Sugano, Mitsutoshi

    2013-12-01

    The traditional identification of bacteria on the basis of phenotypic characteristics is generally not as accurate as identification based on genotypic methods. For many years, sequencing of the 16S ribosomal RNA (rRNA) gene has served as an important tool for determining phylogenetic relationships between bacteria. The features of this molecular target that make it a useful phylogenetic tool also make it useful for bacterial detection and identification in the clinical laboratory. 16S rRNA gene sequence analysis can better identify poorly described, rarely isolated, or phenotypically aberrant strains, and can lead to the recognition of novel pathogens and noncultured bacteria. In clinical microbiology, molecular identification based on 16S rDNA sequencing is applied fundamentally to bacteria whose identification by means of other types of techniques is impossible or difficult. However, there are some cases in which 16S rRNA gene sequence analysis can not differentiate closely related bacteria such as Shigella spp. and Escherichia coli at the species level. Thus, it is important to understand the advantages and disadvantages of 16S rRNA gene sequence analysis.

  11. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  12. Uncultivated microbial eukaryotic diversity: a method to link ssu rRNA gene sequences with morphology.

    PubMed

    Hirst, Marissa B; Kita, Kelley N; Dawson, Scott C

    2011-01-01

    Protists have traditionally been identified by cultivation and classified taxonomically based on their cellular morphologies and behavior. In the past decade, however, many novel protist taxa have been identified using cultivation independent ssu rRNA sequence surveys. New rRNA "phylotypes" from uncultivated eukaryotes have no connection to the wealth of prior morphological descriptions of protists. To link phylogenetically informative sequences with taxonomically informative morphological descriptions, we demonstrate several methods for combining whole cell rRNA-targeted fluorescent in situ hybridization (FISH) with cytoskeletal or organellar immunostaining. Either eukaryote or ciliate-specific ssu rRNA probes were combined with an anti-α-tubulin antibody or phalloidin, a common actin stain, to define cytoskeletal features of uncultivated protists in several environmental samples. The eukaryote ssu rRNA probe was also combined with Mitotracker® or a hydrogenosomal-specific anti-Hsp70 antibody to localize mitochondria and hydrogenosomes, respectively, in uncultivated protists from different environments. Using rRNA probes in combination with immunostaining, we linked ssu rRNA phylotypes with microtubule structure to describe flagellate and ciliate morphology in three diverse environments, and linked Naegleria spp. to their amoeboid morphology using actin staining in hay infusion samples. We also linked uncultivated ciliates to morphologically similar Colpoda-like ciliates using tubulin immunostaining with a ciliate-specific rRNA probe. Combining rRNA-targeted FISH with cytoskeletal immunostaining or stains targeting specific organelles provides a fast, efficient, high throughput method for linking genetic sequences with morphological features in uncultivated protists. When linked to phylotype, morphological descriptions of protists can both complement and vet the increasing number of sequences from uncultivated protists, including those of novel lineages

  13. A Model for Viral Assembly around an Explicit RNA Sequence Generates an Implicit Fitness Landscape.

    PubMed

    Dykeman, Eric Charles

    2017-08-08

    Previously, a stochastic model of single-stranded RNA virus assembly was created to model the cooperative effects between capsid proteins and genomic RNA that would occur in a packaging signal-mediated assembly process. In such an assembly scenario, multiple secondary structural elements from within the RNA, termed "packaging signals" (PS), contact coat proteins and facilitate efficient capsid assembly. In this work, the assembly model is extended to incorporate explicit nucleotide sequence information as well as simple aspects of RNA folding that would be occurring during the RNA/capsid coassembly process. Applying this paradigm to a dodecahedral viral capsid, a computer-derived nucleotide sequence is evolved de novo that is optimal for packaging the RNA into capsids, while also containing capacity for coding for a viral protein. Analysis of the effects of mutations on the ability of the RNA sequence to successfully package into a viral capsid reveals a complex fitness landscape where the majority of mutations are neutral with respect to packaging efficiency with a small number of mutations resulting in a near-complete loss of RNA packaging. Moreover, the model shows how attempts to ablate PSs in the viral RNA sequence may result in redundant PSs already present in the genome fulfilling their packaging role. This explains why recent experiments that attempt to ablate putative PSs may not see an effect on packaging. This modeling framework presents an example of how an implicit mapping can be made from genotype to a fitness parameter important for viral biology, i.e., viral capsid yield, with potential applications to theoretical models of viral evolution. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  14. Template sequence near the initiation nucleotide can modulate brome mosaic virus RNA accumulation in plant protoplasts.

    PubMed

    Hema, M; Kao, C Cheng

    2004-02-01

    Bromoviral templates for plus-strand RNA synthesis are rich in A or U nucleotides in comparison to templates for minus-strand RNA synthesis. Previous studies demonstrated that plus-strand RNA synthesis by the brome mosaic virus (BMV) RNA replicase is more efficient if the template contains an A/U-rich template sequence near the initiation site (K. Sivakumaran and C. C. Kao, J. Virol. 73:6415-6423, 1999). These observations led us to examine the effects of nucleotide changes near the template's initiation site on the accumulation of BMV RNA3 genomic minus-strand, genomic plus-strand, and subgenomic RNAs in barley protoplasts transfected with wild-type and mutant BMV transcripts. Mutations in the template for minus-strand synthesis had only modest effects on BMV replication in barley protoplasts. Mutants with changes to the +3, +5, and +7 template nucleotides accumulated minus-strand RNA at levels similar to the the wild-type level. However, mutations at positions adjacent to the initiation cytidylate in the templates for genomic and subgenomic plus-strand RNA synthesis significantly decreased RNA accumulation. For example, changes at the third template nucleotide for plus-strand RNA3 synthesis resulted in RNA accumulation at between 18 and 24% of the wild-type level, and mutations in the third template nucleotide for subgenomic RNA4 resulted in accumulations at between 7 and 14% of the wild-type level. The effects of the mutations generally decreased as the mutations occurred further from the initiation nucleotide. These findings demonstrate that there are different requirements of the template sequence near the initiation nucleotide for BMV RNA accumulation in plant cells.

  15. De novo prediction of RNA-protein interactions from sequence information.

    PubMed

    Wang, Ying; Chen, Xiaowei; Liu, Zhi-Ping; Huang, Qiang; Wang, Yong; Xu, Derong; Zhang, Xiang-Sun; Chen, Runsheng; Chen, Luonan

    2013-01-27

    Protein-RNA interactions are fundamentally important in understanding cellular processes. In particular, non-coding RNA-protein interactions play an important role to facilitate biological functions in signalling, transcriptional regulation, and even the progression of complex diseases. However, experimental determination of protein-RNA interactions remains time-consuming and labour-intensive. Here, we develop a novel extended naïve-Bayes-classifier for de novo prediction of protein-RNA interactions, only using protein and RNA sequence information. Specifically, we first collect a set of known protein-RNA interactions as gold-standard positives and extract sequence-based features to represent each protein-RNA pair. To fill the gap between high dimensional features and scarcity of gold-standard positives, we select effective features by cutting a likelihood ratio score, which not only reduces the computational complexity but also allows transparent feature integration during prediction. An extended naïve Bayes classifier is then constructed using these effective features to train a protein-RNA interaction prediction model. Numerical experiments show that our method can achieve the prediction accuracy of 0.77 even though only a small number of protein-RNA interaction data are available. In particular, we demonstrate that the extended naïve-Bayes-classifier is superior to the naïve-Bayes-classifier by fully considering the dependences among features. Importantly, we conduct ncRNA pull-down experiments to validate the predicted novel protein-RNA interactions and identify the interacting proteins of sbRNA CeN72 in C. elegans, which further demonstrates the effectiveness of our method.

  16. Next-generation sequencing of the porcine skeletal muscle transcriptome for computational prediction of microRNA gene targets

    USDA-ARS?s Scientific Manuscript database

    MicroRNA are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts. Inhibition is exerted through targeting of a microRNA-protein complex by base-pairing of the microRNA sequence to cognate recognition sequences in the 3’ untranslated region (...

  17. RNA editing in plant mitochondria—connecting RNA target sequences and acting proteins.

    PubMed

    Takenaka, Mizuki; Verbitskiy, Daniil; Zehrmann, Anja; Härtel, Barbara; Bayer-Császár, Eszter; Glass, Franziska; Brennicke, Axel

    2014-11-01

    RNA editing changes several hundred cytidines to uridines in the mRNAs of mitochondria in flowering plants. The target cytidines are identified by a subtype of PPR proteins characterized by tandem modules which each binds with a specific upstream nucleotide. Recent progress in correlating repeat structures with nucleotide identities allows to predict and identify target sites in mitochondrial RNAs. Additional proteins have been found to play a role in RNA editing; their precise function still needs to be elucidated. The enzymatic activity performing the C to U reaction may reside in the C-terminal DYW extensions of the PPR proteins; however, this still needs to be proven. Here we update recent progress in understanding RNA editing in flowering plant mitochondria.

  18. A DNA sequence obtained by replacement of the dopamine RNA aptamer bases is not an aptamer.

    PubMed

    Álvarez-Martos, Isabel; Ferapontova, Elena E

    2017-08-05

    A unique specificity of the aptamer-ligand biorecognition and binding facilitates bioanalysis and biosensor development, contributing to discrimination of structurally related molecules, such as dopamine and other catecholamine neurotransmitters. The aptamer sequence capable of specific binding of dopamine is a 57 nucleotides long RNA sequence reported in 1997 (Biochemistry, 1997, 36, 9726). Later, it was suggested that the DNA homologue of the RNA aptamer retains the specificity of dopamine binding (Biochem. Biophys. Res. Commun., 2009, 388, 732). Here, we show that the DNA sequence obtained by the replacement of the RNA aptamer bases for their DNA analogues is not able of specific biorecognition of dopamine, in contrast to the original RNA aptamer sequence. This DNA sequence binds dopamine and structurally related catecholamine neurotransmitters non-specifically, as any DNA sequence, and, thus, is not an aptamer and cannot be used neither for in vivo nor in situ analysis of dopamine in the presence of structurally related neurotransmitters. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. High-quality RNA extraction from copepods for Next Generation Sequencing: A comparative study.

    PubMed

    Asai, Sneha; Ianora, Adrianna; Lauritano, Chiara; Lindeque, Penelope K; Carotenuto, Ylenia

    2015-12-01

    Despite the ecological importance of copepods, few Next Generation Sequencing studies (NGS) have been performed on small crustaceans, and a standard method for RNA extraction is lacking. In this study, we compared three commonly-used methods: TRIzol®, Aurum Total RNA Mini Kit and Qiagen RNeasy Micro Kit, in combination with preservation reagents TRIzol® or RNAlater®, to obtain high-quality and quantity of RNA from copepods for NGS. Total RNA was extracted from the copepods Calanus helgolandicus, Centropages typicus and Temora stylifera and its quantity and quality were evaluated using NanoDrop, agarose gel electrophoresis and Agilent Bioanalyzer. Our results demonstrate that preservation of copepods in RNAlater® and extraction with Qiagen RNeasy Micro Kit were the optimal isolation method for high-quality and quantity of RNA for NGS studies of C. helgolandicus. Intriguingly, C. helgolandicus 28S rRNA is formed by two subunits that separate after heat-denaturation and migrate along with 18S rRNA. This unique property of protostome RNA has never been reported in copepods. Overall, our comparative study on RNA extraction protocols will help increase gene expression studies on copepods using high-throughput applications, such as RNA-Seq and microarrays. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. The landscape of fusion transcripts in spitzoid melanoma and biologically indeterminate spitzoid tumors by RNA sequencing

    PubMed Central

    Wu, Gang; Barnhill, Raymond L.; Lee, Seungjae; Li, Yongjin; Shao, Ying; Easton, John; Dalton, James; Zhang, Jinghui; Pappo, Alberto; Bahrami, Armita

    2016-01-01

    Kinase activation by chromosomal translocations is a common mechanism that drives tumorigenesis in spitzoid neoplasms. To explore the landscape of fusion transcripts in these tumors, we performed whole-transcriptome sequencing using formalin-fixed paraffin-embedded tissues in malignant or biologically indeterminate spitzoid tumors from 7 patients (age 2–14 years). RNA sequence libraries enriched for coding regions were prepared and the sequencing was analyzed by a novel assembly-based algorithm designed for detecting complex fusions. In addition, tumor samples were screened for hotspot TERT promoter mutations, and telomerase expression was assessed by TERT mRNA in situ hybridization (ISH). Two patients had widespread metastasis and subsequently died of disease, and 5 patients had a benign clinical course on limited follow-up (mean: 30 months). RNA sequencing and TERT mRNA ISH were successful in 6 tumors and unsuccessful in 1 disseminating tumor due to low RNA quality. RNA sequencing identified a kinase fusion in 5 of the 6 sequenced tumors: TPM3–NTRK1 (2 tumors), complex rearrangements involving TPM3, ALK, and IL6R (1 tumor), BAIAP2L1–BRAF (1 tumor), and EML4–BRAF (1 disseminating tumor). All predicted chimeric transcripts were expressed at high levels and contained the intact kinase domain. In addition, 2 tumors each contained a second fusion gene, ARID1B-SNX9 or PTPRZ1-NFAM1. The detected chimeric genes were validated by home-brew break-apart or fusion fluorescence in situ hybridization. The 2 disseminating tumors each harbored the TERT promoter −124C>T (Chr 5:1,295,228 hg19 coordinate) mutation whereas the remaining 5 tumors retained the wild-type gene. The presence of the −124C>T mutation correlated with telomerase expression by TERT mRNA ISH. In summary, we demonstrated complex fusion transcripts and novel partner genes for BRAF by RNA sequencing of FFPE samples. The diversity of gene fusions demonstrated by RNA sequencing defines the molecular

  1. Investigation of molluscan phylogeny on the basis of 18S rRNA sequences.

    PubMed

    Winnepenninckx, B; Backeljau, T; De Wachter, R

    1996-12-01

    The 18S rRNA sequences of 12 molluscs, representing the extant classes Gastropoda, Bivalvia, Polyplacophora, Scaphopoda, and Caudofoveata, were determined and compared with selected known 18S rRNA sequences of Metazoa, including other Mollusca. These data do not provide support for a close relationship between Platyhelminthes (Turbellaria) and Mollusca, but rather suggest that the latter group belongs to a clade of eutrochozoan coelomates. The 18S rRNA data fail to recover molluscan, bivalve, or gastropod monophyly. However, the branching pattern of the eutrochozoan phyla and classes is unstable, probably due to the explosive Cambrian radiation during which these groups arose. Similarly, the 18S rRNA data do not provide a reliable signal for the molluscan interclass relationships. Nevertheless, we obtained strong preliminary support for phylogenetic inferences at more restricted taxonomic levels, such as the monophyly of Polyplacophora, Caenogastropoda, Euthyneura, Heterodonta, and Arcoida.

  2. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters.

    PubMed

    Core, Leighton J; Waterfall, Joshua J; Lis, John T

    2008-12-19

    RNA polymerases are highly regulated molecular machines. We present a method (global run-on sequencing, GRO-seq) that maps the position, amount, and orientation of transcriptionally engaged RNA polymerases genome-wide. In this method, nuclear run-on RNA molecules are subjected to large-scale parallel sequencing and mapped to the genome. We show that peaks of promoter-proximal polymerase reside on approximately 30% of human genes, transcription extends beyond pre-messenger RNA 3' cleavage, and antisense transcription is prevalent. Additionally, most promoters have an engaged polymerase upstream and in an orientation opposite to the annotated gene. This divergent polymerase is associated with active genes but does not elongate effectively beyond the promoter. These results imply that the interplay between polymerases and regulators over broad promoter regions dictates the orientation and efficiency of productive transcription.

  3. Sequence- and structure-specific RNA processing by a CRISPR endonuclease.

    PubMed

    Haurwitz, Rachel E; Jinek, Martin; Wiedenheft, Blake; Zhou, Kaihong; Doudna, Jennifer A

    2010-09-10

    Many bacteria and archaea contain clustered regularly interspaced short palindromic repeats (CRISPRs) that confer resistance to invasive genetic elements. Central to this immune system is the production of CRISPR-derived RNAs (crRNAs) after transcription of the CRISPR locus. Here, we identify the endoribonuclease (Csy4) responsible for CRISPR transcript (pre-crRNA) processing in Pseudomonas aeruginosa. A 1.8 angstrom crystal structure of Csy4 bound to its cognate RNA reveals that Csy4 makes sequence-specific interactions in the major groove of the crRNA repeat stem-loop. Together with electrostatic contacts to the phosphate backbone, these enable Csy4 to bind selectively and cleave pre-crRNAs using phylogenetically conserved serine and histidine residues in the active site. The RNA recognition mechanism identified here explains sequence- and structure-specific processing by a large family of CRISPR-specific endoribonucleases.

  4. Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato

    PubMed Central

    Macias-Muñoz, Aide; Briscoe, Adriana D.

    2014-01-01

    Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long. PMID:24831145

  5. Determining mutant spectra of three RNA viral samples using ultra-deep sequencing

    SciTech Connect

    Chen, H

    2012-06-06

    RNA viruses have extremely high mutation rates that enable the virus to adapt to new host environments and even jump from one species to another. As part of a viral transmission study, three viral samples collected from naturally infected animals were sequenced using Illumina paired-end technology at ultra-deep coverage. In order to determine the mutant spectra within the viral quasispecies, it is critical to understand the sequencing error rates and control for false positive calls of viral variants (point mutantations). I will estimate the sequencing error rate from two control sequences and characterize the mutant spectra in the natural samples with this error rate.

  6. RNA Sequencing and Bioinformatics Analysis Implicate the Regulatory Role of a Long Noncoding RNA-mRNA Network in Hepatic Stellate Cell Activation.

    PubMed

    Guo, Can-Jie; Xiao, Xiao; Sheng, Li; Chen, Lili; Zhong, Wei; Li, Hai; Hua, Jing; Ma, Xiong

    2017-08-11

    To analyze the long noncoding (lncRNA)-mRNA expression network and potential roles in rat hepatic stellate cells (HSCs) during activation. LncRNA expression was analyzed in quiescent and culture-activated HSCs by RNA sequencing, and differentially expressed lncRNAs verified by quantitative reverse transcription polymerase chain reaction (qRT-PCR) were subjected to bioinformatics analysis. In vivo analyses of differential lncRNA-mRNA expression were performed on a rat model of liver fibrosis. We identified upregulation of 12 lncRNAs and 155 mRNAs and downregulation of 12 lncRNAs and 374 mRNAs in activated HSCs. Additionally, we identified the differential expression of upregulated lncRNAs (NONRATT012636.2, NONRATT016788.2, and NONRATT021402.2) and downregulated lncRNAs (NONRATT007863.2, NONRATT019720.2, and NONRATT024061.2) in activated HSCs relative to levels observed in quiescent HSCs, and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses showed that changes in lncRNAs associated with HSC activation revealed 11 significantly enriched pathways according to their predicted targets. Moreover, based on the predicted co-expression network, the relative dynamic levels of NONRATT013819.2 and lysyl oxidase (Lox) were compared during HSC activation both in vitro and in vivo. Our results confirmed the upregulation of lncRNA NONRATT013819.2 and Lox mRNA associated with the extracellular matrix (ECM)-related signaling pathway in HSCs and fibrotic livers. Our results detailing a dysregulated lncRNA-mRNA network might provide new treatment strategies for hepatic fibrosis based on findings indicating potentially critical roles for NONRATT013819.2 and Lox in ECM remodeling during HSC activation. © 2017 The Author(s). Published by S. Karger AG, Basel.

  7. Cell-SELEX Identifies a “Sticky” RNA Aptamer Sequence

    PubMed Central

    2017-01-01

    Cell-SELEX is performed to select for cell binding aptamers. We employed an additional selection pressure by using RNAse to remove surface-binding aptamers and select for cell-internalizing aptamers. A common RNA sequence was identified from independent cell-SELEX procedures against two different pancreatic cancer cell lines, indicating a strong selection pressure towards this sequence from the large pool of other available sequences present in the aptamer library. The aptamer is not specific for the pancreatic cancer cell lines, and a similar sequence motif is present in previously published internalizing aptamers. The identified sequence forms a structural motif that binds to a surface protein, which either is highly abundant or has strong affinity for the selected aptamer sequence. Deselecting (removing) this sequence during cell-SELEX may increase the probability of identifying aptamers against cell type-specific targets on the cell surface. PMID:28194280

  8. Preparation of Single-Cell RNA-Seq Libraries for Next Generation Sequencing

    PubMed Central

    Trombetta, John J.; Gennert, David; Lu, Diana; Satija, Rahul; Shalek, Alex K.; Regev, Aviv

    2014-01-01

    For the past several decades, due to technical limitations, the field of transcriptomics has focused on population-level measurements that can mask significant differences between individual cells. With the advent of single-cell RNA-Seq, it is now possible to profile the responses of individual cells at unprecedented depth and thereby uncover, transcriptome-wide, the heterogeneity that exists within these populations. Here, we describe a method that merges several important technologies to produce, in high-throughput, single-cell RNA-Seq libraries. Complementary DNA (cDNA) is made from full-length mRNA transcripts using a reverse transcriptase that has terminal transferase activity. This, when combined with a second “template-switch” primer, allows for cDNAs to be constructed that have two universal priming sequences. Following preamplification from these common sequences, Nextera XT is used to prepare a pool of 96 uniquely indexed samples ready for Illumina sequencing. PMID:24984854

  9. Telomerase RNA stem terminus element affects template boundary element function, telomere sequence, and shelterin binding.

    PubMed

    Webb, Christopher J; Zakian, Virginia A

    2015-09-08

    The stem terminus element (STE), which was discovered 13 y ago in human telomerase RNA, is required for telomerase activity, yet its mode of action is unknown. We report that the Schizosaccharomyces pombe telomerase RNA, TER1 (telomerase RNA 1), also contains a STE, which is essential for telomere maintenance. Cells expressing a partial loss-of-function TER1 STE allele maintained short stable telomeres by a recombination-independent mechanism. Remarkably, the mutant telomere sequence was different from that of wild-type cells. Generation of the altered sequence is explained by reverse transcription into the template boundary element, demonstrating that the STE helps maintain template boundary element function. The altered telomeres bound less Pot1 (protection of telomeres 1) and Taz1 (telomere-associated in Schizosaccharomyces pombe 1) in vivo. Thus, the S. pombe STE, although distant from the template, ensures proper telomere sequence, which in turn promotes proper assembly of the shelterin complex.

  10. Massive microRNA sequence conservation and prevalence in human and chimpanzee introns.

    PubMed

    Hill, Aubrey E; Sorscher, Eric J

    2013-06-01

    Human and chimpanzee introns contain numerous sequences strongly related to known microRNA hairpin structures. The relative frequency is precisely maintained across all chromosomes, suggesting the possible co-evolution of gene networks dependent upon microRNA regulation and with origins corresponding to the advent of primate transposable elements (TEs). While the motifs are known to be derived from transposable elements, the most common are far more numerous than expected from the number of TEs and their paralogous sequences, and exhibit striking conservation in comparison to the surrounding TE sequence context. Several of these motifs also exhibit structural complimentarity to each other, suggesting a pairing function at the level of DNA or RNA. These "pseudomicroRNAs," in semblance to pseudogenes, include hundreds of thousands of vestigial paralogs of primate microRNAs, many of which may have functioned historically or remain active today.

  11. Nucleotide sequence of a satellite RNA associated with carrot motley dwarf in parsley and carrot.

    PubMed

    Menzel, Wulf; Maiss, Edgar; Vetten, H Josef

    2009-02-01

    Carrot motley dwarf (CMD) is known to result from a mixed infection by two viruses, the polerovirus Carrot red leaf virus and one of the umbraviruses Carrot mottle mimic virus or Carrot mottle virus. Some umbraviruses have been shown to be associated with small satellite (sat) RNAs, but none have been reported for the latter two. A CMD-affected parsley plant was used for sap transmission to test plants, that were used for dsRNA isolation. The presence of a 0.8-kbp dsRNA indicated the occurrence of a hitherto unrecognized satRNA associated with CMD. The satRNAs of the CMD isolate from parsley and an isolate from carrot have been sequenced and showed 94% sequence identity. Nucleotide sequences and putative translation products had no significant similarities to GenBank entries. To our knowledge, this is the first report of satRNAs associated with CMD.

  12. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    PubMed

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.

  13. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions

    PubMed Central

    Nolte-’t Hoen, Esther N. M.; Buermans, Henk P. J.; Waasdorp, Maaike; Stoorvogel, Willem; Wauben, Marca H. M.; ’t Hoen, Peter A. C.

    2012-01-01

    Cells release RNA-carrying vesicles and membrane-free RNA/protein complexes into the extracellular milieu. Horizontal vesicle-mediated transfer of such shuttle RNA between cells allows dissemination of genetically encoded messages, which may modify the function of target cells. Other studies used array analysis to establish the presence of microRNAs and mRNA in cell-derived vesicles from many sources. Here, we used an unbiased approach by deep sequencing of small RNA released by immune cells. We found a large variety of small non-coding RNA species representing pervasive transcripts or RNA cleavage products overlapping with protein coding regions, repeat sequences or structural RNAs. Many of these RNAs were enriched relative to cellular RNA, indicating that cells destine specific RNAs for extracellular release. Among the most abundant small RNAs in shuttle RNA were sequences derived from vault RNA, Y-RNA and specific tRNAs. Many of the highly abundant small non-coding transcripts in shuttle RNA are evolutionary well-conserved and have previously been associated to gene regulatory functions. These findings allude to a wider range of biological effects that could be mediated by shuttle RNA than previously expected. Moreover, the data present leads for unraveling how cells modify the function of other cells via transfer of specific non-coding RNA species. PMID:22821563

  14. Analysis options for high-throughput sequencing in miRNA expression profiling

    PubMed Central

    2014-01-01

    Background Recently high-throughput sequencing (HTS) using next generation sequencing techniques became useful in digital gene expression profiling. Our study introduces analysis options for HTS data based on mapping to miRBase or counting and grouping of identical sequence reads. Those approaches allow a hypothesis free detection of miRNA differential expression. Methods We compare our results to microarray and qPCR data from one set of RNA samples. We use Illumina platforms for microarray analysis and miRNA sequencing of 20 samples from benign follicular thyroid adenoma and malignant follicular thyroid carcinoma. Furthermore, we use three strategies for HTS data analysis to evaluate miRNA biomarkers for malignant versus benign follicular thyroid tumors. Results High correlation of qPCR and HTS data was observed for the proposed analysis methods. However, qPCR is limited in the differential detection of miRNA isoforms. Moreover, we illustrate a much broader dynamic range of HTS compared to microarrays for small RNA studies. Finally, our data confirm hsa-miR-197-3p, hsa-miR-221-3p, hsa-miR-222-3p and both hsa-miR-144-3p and hsa-miR-144-5p as potential follicular thyroid cancer biomarkers. Conclusions Compared to microarrays HTS provides a global profile of miRNA expression with higher specificity and in more detail. Summarizing of HTS reads as isoform groups (analysis pipeline B) or according to functional criteria (seed analysis pipeline C), which better correlates to results of qPCR are promising new options for HTS analysis. Finally, data opens future miRNA research perspectives for HTS and indicates that qPCR might be limited in validating HTS data in detail. PMID:24625073

  15. Sequencing and Characterisation of an Extensive Atlantic Salmon (Salmo salar L.) MicroRNA Repertoire

    PubMed Central

    Bekaert, Michaël; Lowe, Natalie R.; Bishop, Stephen C.; Bron, James E.; Taggart, John B.; Houston, Ross D.

    2013-01-01

    Atlantic salmon (Salmo salar L.), a member of the family Salmonidae, is a totemic species of ecological and cultural significance that is also economically important in terms of both sports fisheries and aquaculture. These factors have promoted the continuous development of genomic resources for this species, furthering both fundamental and applied research. MicroRNAs (miRNA) are small endogenous non-coding RNA molecules that control spatial and temporal expression of targeted genes through post-transcriptional regulation. While miRNA have been characterised in detail for many other species, this is not yet the case for Atlantic salmon. To identify miRNAs from Atlantic salmon, we constructed whole fish miRNA libraries for 18 individual juveniles (fry, four months post hatch) and characterised them by Illumina high-throughput sequencing (total of 354,505,167 paired-ended reads). We report an extensive and partly novel repertoire of miRNA sequences, comprising 888 miRNA genes (547 unique mature miRNA sequences), quantify their expression levels in basal conditions, examine their homology to miRNAs from other species and identify their predicted target genes. We also identify the location and putative copy number of the miRNA genes in the draft Atlantic salmon reference genome sequence. The Atlantic salmon miRNAs experimentally identified in this study provide a robust large-scale resource for functional genome research in salmonids. There is an opportunity to explore the evolution of salmonid miRNAs following the relatively recent whole genome duplication event in salmonid species and to investigate the role of miRNAs in the regulation of gene expression in particular their contribution to variation in economically and ecologically important traits. PMID:23922936

  16. Prediction of effective RNA interference targets and pathway-related genes in lepidopteran insects by RNA sequencing analysis.

    PubMed

    Guan, Ruo-Bing; Li, Hai-Chao; Miao, Xue-Xia

    2017-01-06

    When using RNA interference (RNAi) to study gene functions in Lepidoptera insects, we discovered that some genes could not be suppressed; instead, their expression levels could be up-regulated by double-stranded RNA (dsRNA). To predict which genes could be easily silenced, we treated the Asian corn borer (Ostrinia furnacalis) with dsGFP (green fluorescent protein) and dsMLP (muscle lim protein). A transcriptome sequence analysis was conducted using the cDNAs 6 h after treatment with dsRNA. The results indicated that 160 genes were up-regulated and 44 genes were down-regulated by the two dsRNAs. Then, 50 co-up-regulated, 25 co-down-regulated and 43 unaffected genes were selected to determine their RNAi responses. All the 25 down-regulated genes were knocked down by their corresponding dsRNA. However, several of the up-regulated and unaffected genes were up-regulated when treated with their corresponding dsRNAs instead of being knocked down. The genes up-regulated by the dsGFP treatment may be involved in insect immune responses or the RNAi pathway. When the immune-related genes were excluded, only seven genes were induced by dsGFP, including ago-2 and dicer-2. These results not only provide a reference for efficient RNAi target predications, but also provide some potential RNAi pathway-related genes for further study.

  17. YM500v2: a small RNA sequencing (smRNA-seq) database for human cancer miRNome research.

    PubMed

    Cheng, Wei-Chung; Chung, I-Fang; Tsai, Cheng-Fong; Huang, Tse-Shun; Chen, Chen-Yang; Wang, Shao-Chuan; Chang, Ting-Yu; Sun, Hsing-Jen; Chao, Jeffrey Yung-Chuan; Cheng, Cheng-Chung; Wu, Cheng-Wen; Wang, Hsei-Wei

    2015-01-01

    We previously presented YM500, which is an integrated database for miRNA quantification, isomiR identification, arm switching discovery and novel miRNA prediction from 468 human smRNA-seq datasets. Here in this updated YM500v2 database (http://ngs.ym.edu.tw/ym500/), we focus on the cancer miRNome to make the database more disease-orientated. New miRNA-related algorithms developed after YM500 were included in YM500v2, and, more significantly, more than 8000 cancer-related smRNA-seq datasets (including those of primary tumors, paired normal tissues, PBMC, recurrent tumors, and metastatic tumors) were incorporated into YM500v2. Novel miRNAs (miRNAs not included in the miRBase R21) were not only predicted by three independent algorithms but also cleaned by a new in silico filtration strategy and validated by wetlab data such as Cross-Linked ImmunoPrecipitation sequencing (CLIP-seq) to reduce the false-positive rate. A new function 'Meta-analysis' is additionally provided for allowing users to identify real-time differentially expressed miRNAs and arm-switching events according to customer-defined sample groups and dozens of clinical criteria tidying up by proficient clinicians. Cancer miRNAs identified hold the potential for both basic research and biotech applications.

  18. Prediction of Immunomodulatory potential of an RNA sequence for designing non-toxic siRNAs and RNA-based vaccine adjuvants

    PubMed Central

    Chaudhary, Kumardeep; Nagpal, Gandharva; Dhanda, Sandeep Kumar; Raghava, Gajendra P. S.

    2016-01-01

    Our innate immune system recognizes a foreign RNA sequence of a pathogen and activates the immune system to eliminate the pathogen from our body. This immunomodulatory potential of RNA can be used to design RNA-based immunotherapy and vaccine adjuvants. In case of siRNA-based therapy, the immunomodulatory effect of an RNA sequence is unwanted as it may cause immunotoxicity. Thus, we developed a method for designing a single-stranded RNA (ssRNA) sequence with desired immunomodulatory potentials, for designing RNA-based therapeutics, immunotherapy and vaccine adjuvants. The dataset used for training and testing our models consists of 602 experimentally verified immunomodulatory oligoribonucleotides (IMORNs) that are ssRNA sequences of length 17 to 27 nucleotides and 520 circulating miRNAs as non-immunomodulatory sequences. We developed prediction models using various features that include composition-based features, binary profile, selected features, and hybrid features. All models were evaluated using five-fold cross-validation and external validation techniques; achieving a maximum mean Matthews Correlation Coefficient (MCC) of 0.86 with 93% accuracy. We identified motifs using MERCI software and observed the abundance of adenine (A) in motifs. Based on the above study, we developed a web server, imRNA, comprising of various modules important for designing RNA-based therapeutics (http://crdd.osdd.net/raghava/imrna/). PMID:26861761

  19. Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  20. Sequence and analysis of the gene for bacteriophage T3 RNA polymerase.

    PubMed Central

    McGraw, N J; Bailey, J N; Cleaves, G R; Dembinski, D R; Gocke, C R; Joliffe, L K; MacWright, R S; McAllister, W T

    1985-01-01

    The RNA polymerases encoded by bacteriophages T3 and T7 have similar structures, but exhibit nearly exclusive template specificities. We have determined the nucleotide sequence of the region of T3 DNA that encodes the T3 RNA polymerase (the gene 1.0 region), and have compared this sequence with the corresponding region of T7 DNA. The predicted amino acid sequence of the T3 RNA polymerase exhibits very few changes when compared to the T7 enzyme (82% of the residues are identical). Significant differences appear to cluster in three distinct regions in the amino-terminal half of the protein. Analysis of the data from both enzymes suggests features that may be important for polymerase function. In particular, a region that differs between the T3 and T7 enzymes exhibits significant homology to the bi-helical domain that is common to many sequence-specific DNA binding proteins. The region that flanks the structural gene contains a number of regulatory elements including: a promoter for the E. coli RNA polymerase, a potential processing site for RNase III and a promoter for the T3 polymerase. The promoter for the T3 RNA polymerase is located only 12 base pairs distal to the stop codon for the structural gene. PMID:3903658

  1. Total chemical synthesis of a 77-nucleotide-long RNA sequence having methionine-acceptance activity.

    PubMed Central

    Ogilvie, K K; Usman, N; Nicoghosian, K; Cedergren, R J

    1988-01-01

    Chemical synthesis is described of a 77-nucleotide-long RNA molecule that has the sequence of an Escherichia coli Ado-47-containing tRNA(fMet) species in which the modified nucleosides have been substituted by their unmodified parent nucleosides. The sequence was assembled on a solid-phase, controlled-pore glass support in a stepwise manner with an automated DNA synthesizer. The ribonucleotide building blocks used were fully protected 5'-monomethoxytrityl-2'-silyl-3'-N,N-diisopropylaminophosphoram idites. p-Nitro-phenylethyl groups were used to protect the O6 of guanine residues. The fully deprotected tRNA analogue was characterized by polyacrylamide gel electrophoresis (sizing), terminal nucleotide analysis, sequencing, and total enzyme degradation, all of which indicated that the sequence was correct and contained only 3-5 linkages. The 77-mer was then assayed for amino acid acceptor activity by using E. coli methionyl-tRNA synthetase. The results indicated that the synthetic product, lacking modified bases, is a substrate for the enzyme and has an amino acid acceptance 11% of that of the major native species, tRNA(fMet) containing 7-methylguanosine at position 47. Images PMID:3413059

  2. Dual RNA-Sequencing to Elucidate the Plant-Pathogen Duel.

    PubMed

    Naidoo, Sanushka; Visser, Erik Andrei; Zwart, Lizahn; Toit, Yves du; Bhadauria, Vijai; Shuey, Louise Simone

    2017-09-08

    RNA-sequencing technology has been widely adopted to investigate host responses during infection with pathogens. Dual RNA-sequencing (RNA-seq) allows the simultaneous capture of pathogen specific transcripts during infection, providing a more complete view of the interaction. In this review, we focus on the design of dual RNA-seq experiments and the application of downstream data analysis to gain biological insight into both sides of the interaction. Recent literature in this area demonstrates the power of the dual RNA-seq approach and shows that it is not limited to model systems where genomic resources are available. A reduction in sequencing cost and single cell transcriptomics coupled with protein and metabolite level dual approaches are set to enhance our understanding of plant-pathogen interactions. Sequencing costs continue to decrease and single cell transcriptomics is becoming more feasible. In combination with proteomics and metabolomics studies, these technological advances are likely to contribute to our understanding of the temporal and spatial aspects of dynamic plant-pathogen interactions.

  3. incaRNAfbinv: a web server for the fragment-based design of RNA sequences.

    PubMed

    Drory Retwitzer, Matan; Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme; Barash, Danny

    2016-07-08

    In recent years, new methods for computational RNA design have been developed and applied to various problems in synthetic biology and nanotechnology. Lately, there is considerable interest in incorporating essential biological information when solving the inverse RNA folding problem. Correspondingly, RNAfbinv aims at including biologically meaningful constraints and is the only program to-date that performs a fragment-based design of RNA sequences. In doing so it allows the design of sequences that do not necessarily exactly fold into the target, as long as the overall coarse-grained tree graph shape is preserved. Augmented by the weighted sampling algorithm of incaRNAtion, our web server called incaRNAfbinv implements the method devised in RNAfbinv and offers an interactive environment for the inverse folding of RNA using a fragment-based design approach. It takes as input: a target RNA secondary structure; optional sequence and motif constraints; optional target minimum free energy, neutrality and GC content. In addition to the design of synthetic regulatory sequences, it can be used as a pre-processing step for the detection of novel natural occurring RNAs. The two complementary methodologies RNAfbinv and incaRNAtion are merged together and fully implemented in our web server incaRNAfbinv, available at http://www.cs.bgu.ac.il/incaRNAfbinv. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Phenotype classification of single cells using SRS microscopy, RNA sequencing, and microfluidics (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Streets, Aaron M.; Cao, Chen; Zhang, Xiannian; Huang, Yanyi

    2016-03-01

    Phenotype classification of single cells reveals biological variation that is masked in ensemble measurement. This heterogeneity is found in gene and protein expression as well as in cell morphology. Many techniques are available to probe phenotypic heterogeneity at the single cell level, for example quantitative imaging and single-cell RNA sequencing, but it is difficult to perform multiple assays on the same single cell. In order to directly track correlation between morphology and gene expression at the single cell level, we developed a microfluidic platform for quantitative coherent Raman imaging and immediate RNA sequencing (RNA-Seq) of single cells. With this device we actively sort and trap cells for analysis with stimulated Raman scattering microscopy (SRS). The cells are then processed in parallel pipelines for lysis, and preparation of cDNA for high-throughput transcriptome sequencing. SRS microscopy offers three-dimensional imaging with chemical specificity for quantitative analysis of protein and lipid distribution in single cells. Meanwhile, the microfluidic platform facilitates single-cell manipulation, minimizes contamination, and furthermore, provides improved RNA-Seq detection sensitivity and measurement precision, which is necessary for differentiating biological variability from technical noise. By combining coherent Raman microscopy with RNA sequencing, we can better understand the relationship between cellular morphology and gene expression at the single-cell level.

  5. A Census of rRNA Genes and Linked Genomic Sequences within a Soil Metagenomic Library

    PubMed Central

    Liles, Mark R.; Manske, Brian F.; Bintrim, Scott B.; Handelsman, Jo; Goodman, Robert M.

    2003-01-01

    We have analyzed the diversity of microbial genomes represented in a library of metagenomic DNA from soil. A total of 24,400 bacterial artificial chromosome (BAC) clones were screened for 16S rRNA genes. The sequences obtained from BAC clones were compared with a collection generated by direct PCR amplification and cloning of 16S rRNA genes from the same soil. The results indicated that the BAC library had substantially lower representation of bacteria among the Bacillus, α-Proteobacteria, and CFB groups; greater representation among the β- and γ-Proteobacteria, and OP10 divisions; and no rRNA genes from the domains Eukaryota and Archaea. In addition to rRNA genes recovered from the bacterial divisions Proteobacteria, Verrucomicrobia, Firmicutes, Cytophagales, and OP11, we identified many rRNA genes from the BAC library affiliated with the bacterial division Acidobacterium; all of these sequences were affiliated with subdivisions that lack cultured representatives. The complete sequence of one BAC clone derived from a member of the Acidobacterium division revealed a complete rRNA operon and 20 other open reading frames, including predicted gene products involved in cell division, cell cycling, folic acid biosynthesis, substrate metabolism, amino acid uptake, DNA repair, and transcriptional regulation. This study is the first step in using genomics to reveal the physiology of as-yet-uncultured members of the Acidobacterium division. PMID:12732537

  6. Sequence-specific cleavage of RNA by Type II restriction enzymes

    PubMed Central

    Murray, Iain A.; Stickel, Shawn K.; Roberts, Richard J.

    2010-01-01

    The ability of 223 Type II restriction endonucleases to hydrolyze RNA–DNA heteroduplex oligonucleotide substrates was assessed. Despite the significant topological and sequence asymmetry introduced when one strand of a DNA duplex is substituted by RNA we find that six restriction enzymes (AvaII, AvrII, BanI, HaeIII, HinfI and TaqI), exclusively of the Type IIP class that recognize palindromic or interrupted-palindromic DNA sequences, catalyze robust and specific cleavage of both RNA and DNA strands of such a substrate. Time-course analyses indicate that some endonucleases hydrolyze phosphodiester bonds in both strands simultaneously whereas others appear to catalyze sequential reactions in which either the DNA or RNA product accumulates more rapidly. Such strand-specific variation in cleavage susceptibility is both significant (up to orders of magnitude difference) and somewhat sequence dependent, notably in relation to the presence or absence of uracil residues in the RNA strand. Hybridization to DNA oligonucleotides that contain endonuclease recognition sites can be used to achieve targeted hydrolysis of extended RNA substrates produced by in vitro transcription. The ability to ‘restrict’ an RNA–DNA hybrid, albeit with a limited number of restriction endonucleases, provides a method whereby individual RNA molecules can be targeted for site-specific cleavage in vitro. PMID:20702422

  7. Phylogeny of protostome worms derived from 18S rRNA sequences.

    PubMed

    Winnepenninckx, B; Backeljau, T; De Wachter, R

    1995-07-01

    The phylogenetic relationships of protostome worms were studied by comparing new complete 18S rRNA sequences of Vestimentifera, Pogonophora, Sipuncula, Echiura, Nemertea, and Annelida with existing 18S rRNA sequences of Mollusca, Arthropoda, Chordata, and Platyhelminthes. Phylogenetic trees were inferred via neighbor-joining and maximum parsimony analyses. These suggest that (1) Sipuncula and Echiura are not sister groups; (2) Nemertea are protostomes; (3) Vestimentifera and Pogonophora are protostomes that have a common ancestor with Echiura; and (4) Vestimentifera and Pogonophora are a monophyletic clade.

  8. The impact of RNA sequence library construction protocols on transcriptomic profiling of leukemia.

    PubMed

    Kumar, Ashwini; Kankainen, Matti; Parsons, Alun; Kallioniemi, Olli; Mattila, Pirkko; Heckman, Caroline A

    2017-08-17

    RNA sequencing (RNA-seq) has become an indispensable tool to identify disease associated transcriptional profiles and determine the molecular underpinnings of diseases. However, the broad adaptation of the methodology into the clinic is still hampered by inconsistent results from different RNA-seq protocols and involves further evaluation of its analytical reliability using patient samples. Here, we applied two commonly used RNA-seq library preparation protocols to samples from acute leukemia patients to understand how poly-A-tailed mRNA selection (PA) and ribo-depletion (RD) based RNA-seq library preparation protocols affect gene fusion detection, variant calling, and gene expression profiling. Overall, the protocols produced similar results with consistent outcomes. Nevertheless, the PA protocol was more efficient in quantifying expression of leukemia marker genes and showed better performance in the expression-based classification of leukemia. Independent qRT-PCR experiments verified that the PA protocol better represented total RNA compared to the RD protocol. In contrast, the RD protocol detected a higher number of non-coding RNA features and had better alignment efficiency. The RD protocol also recovered more known fusion-gene events, although variability was seen in fusion gene predictions. The overall findings provide a framework for the use of RNA-seq in a precision medicine setting with limited number of samples and suggest that selection of the library preparation protocol should be based on the objectives of the analysis.

  9. High-Throughput Sequencing Reveals Circular Substrates for an Archaeal RNA ligase.

    PubMed

    Becker, Hubert F; Heliou, Alice; Djaout, Kamel; Lestini, Roxane; Regnier, Mireille; Myllykallio, Hannu

    2017-03-09

    It is only recently that the abundant presence of circular RNAs (circRNAs) in all kingdoms of Life, including the hyperthermophilic archaeon Pyrococcus abyssi, has emerged. This led us to investigate the physiological significance of a previously observed weak intramolecular ligation activity of Pab1020 RNA ligase. Here we demonstrate that this enzyme, despite sharing significant sequence similarity with DNA ligases, is indeed an RNA-specific polynucleotide ligase efficiently acting on physiologically significant substrates. Using a combination of RNA immunoprecipitation assays and RNA-seq, our genome-wide studies revealed 133 individual circRNA loci in P. abyssi. The large majority of these loci interacted with Pab1020 in cells and circularization of selected C/D Box and 5S rRNA transcripts was confirmed biochemically. Altogether these studies revealed that Pab1020 is required for RNA circularization. Our results further suggest the functional speciation of an ancestral NTase domain and/or DNA ligase towards RNA ligase activity and prompt for further characterization of the widespread functions of circular RNAs in prokaryotes. Detailed insight into the cellular substrates of Pab1020 may facilitate the development of new biotechnological applications e.g. in ligation of preadenylated adaptors to RNA molecules.

  10. Sequence selective recognition of double-stranded RNA using triple helix-forming peptide nucleic acids.

    PubMed

    Zengeya, Thomas; Gupta, Pankaj; Rozners, Eriks

    2014-01-01

    Noncoding RNAs are attractive targets for molecular recognition because of the central role they play in gene expression. Since most noncoding RNAs are in a double-helical conformation, recognition of such structures is a formidable problem. Herein, we describe a method for sequence-selective recognition of biologically relevant double-helical RNA (illustrated on ribosomal A-site RNA) using peptide nucleic acids (PNA) that form a triple helix in the major grove of RNA under physiologically relevant conditions. Protocols for PNA preparation and binding studies using isothermal titration calorimetry are described in detail.

  11. An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.

    PubMed

    Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J

    2014-11-01

    The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles.

  12. Nucleotide sequence of the rrnG ribosomal RNA promoter region of Escherichia coli.

    PubMed Central

    Shen, W F; Squires, C; Squires, C L

    1982-01-01

    The primary structure of the promoter region for a ribosomal RNA transcription unit (rrnG) of Escherichia coli K12 has been determined. The sequence was obtained from 1 1.5 kbp EcoRI fragment derived from the hybrid plasmid pLC23-30. This fragment contains 455 bp preceding P1 of the rrnG promoter region and 674 bp of the rrnG 16S RNA gene. The sequence before the rrnG promoter region contains an open reading frame (ORF-BG) followed by a possible hairpin structure that resembles other known transcription terminators. The sequence of the rrnG promoter region is similar but not identical to that of rrnA and rrnB. Several minor differences between the sequences of the 16S RNA genes of rrnG and rrnB were also noted. In addition, sequences were found that could generate special structures involving the promoter regions of rrn loci. Such structures are described and their possible involvement in the regulation of ribosomal RNA synthesis is discussed. PMID:6285294

  13. Novel transcription factor variants through RNA-sequencing: the importance of being "alternative".

    PubMed

    Scarpato, Margherita; Federico, Antonio; Ciccodicola, Alfredo; Costa, Valerio

    2015-01-13

    Alternative splicing is a pervasive mechanism of RNA maturation in higher eukaryotes, which increases proteomic diversity and biological complexity. It has a key regulatory role in several physiological and pathological states. The diffusion of Next Generation Sequencing, particularly of RNA-Sequencing, has exponentially empowered the identification of novel transcripts revealing that more than 95% of human genes undergo alternative splicing. The highest rate of alternative splicing occurs in transcription factors encoding genes, mostly in Krüppel-associated box domains of zinc finger proteins. Since these molecules are responsible for gene expression, alternative splicing is a crucial mechanism to "regulate the regulators". Indeed, different transcription factors isoforms may have different or even opposite functions. In this work, through a targeted re-analysis of our previously published RNA-Sequencing datasets, we identified nine novel transcripts in seven transcription factors genes. In silico analysis, combined with RT-PCR, cloning and Sanger sequencing, allowed us to experimentally validate these new variants. Through computational approaches we also predicted their novel structural and functional properties. Our findings indicate that alternative splicing is a major determinant of transcription factor diversity, confirming that accurate analysis of RNA-Sequencing data can reliably lead to the identification of novel transcripts, with potentially new functions.

  14. Novel Transcription Factor Variants through RNA-Sequencing: The Importance of Being “Alternative”

    PubMed Central

    Scarpato, Margherita; Federico, Antonio; Ciccodicola, Alfredo; Costa, Valerio

    2015-01-01

    Alternative splicing is a pervasive mechanism of RNA maturation in higher eukaryotes, which increases proteomic diversity and biological complexity. It has a key regulatory role in several physiological and pathological states. The diffusion of Next Generation Sequencing, particularly of RNA-Sequencing, has exponentially empowered the identification of novel transcripts revealing that more than 95% of human genes undergo alternative splicing. The highest rate of alternative splicing occurs in transcription factors encoding genes, mostly in Krüppel-associated box domains of zinc finger proteins. Since these molecules are responsible for gene expression, alternative splicing is a crucial mechanism to “regulate the regulators”. Indeed, different transcription factors isoforms may have different or even opposite functions. In this work, through a targeted re-analysis of our previously published RNA-Sequencing datasets, we identified nine novel transcripts in seven transcription factors genes. In silico analysis, combined with RT-PCR, cloning and Sanger sequencing, allowed us to experimentally validate these new variants. Through computational approaches we also predicted their novel structural and functional properties. Our findings indicate that alternative splicing is a major determinant of transcription factor diversity, confirming that accurate analysis of RNA-Sequencing data can reliably lead to the identification of novel transcripts, with potentially new functions. PMID:25590302

  15. TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

    PubMed Central

    2011-01-01

    Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based probabilities for co-incidence with each of the other sequences, along with estimated base pairing probabilities, from the previous iteration, for the other sequences. The extrinsic information is introduced as free energy modifications for base pairing in a partition function computation based on the nearest neighbor thermodynamic model. This process yields updated estimates of base pairing probability. The updated base pairing probabilities in turn are used to recompute extrinsic information, resulting in the overall iterative estimation procedure that defines TurboFold. TurboFold is benchmarked on a number of ncRNA datasets and compared against alternative secondary structure prediction methods. The iterative procedure in TurboFold is shown to improve estimates of base pairing probability with each iteration, though only small gains are obtained beyond three iterations. Secondary structures composed of base pairs with estimated probabilities higher than a significance threshold are

  16. Revolution of nephrology research by deep sequencing: ChIP-seq and RNA-seq.

    PubMed

    Mimura, Imari; Kanki, Yasuharu; Kodama, Tatsuhiko; Nangaku, Masaomi

    2014-01-01

    The recent and rapid advent of next-generation sequencing (NGS) has made this technology broadly available not only to researchers in various molecular and cellular biology fields but also to those in kidney disease. In this paper, we describe the usage of ChIP-seq (chromatin immunoprecipitation with sequencing) and RNA-seq for sample preparation and interpretation of raw data in the investigation of biological phenomenon in renal diseases. ChIP-seq identifies genome-wide transcriptional DNA-binding sites as well as histone modifications, which are known to regulate gene expression, in the intragenic as well as in the intergenic regions. With regard to RNA-seq, this process analyzes not only the expression level of mRNA but also splicing variants, non-coding RNA, and microRNA on a genome-wide scale. The combination of ChIP-seq and RNA-seq allows the clarification of novel transcriptional mechanisms, which have important roles in various kinds of diseases, including chronic kidney disease. The rapid development of these techniques requires an update on the latest information and methods of NGS. In this review, we highlight the merits and characteristics of ChIP-seq and RNA-seq and discuss the use of the genome-wide analysis in kidney disease.

  17. Research progress on mechanisms of male sterility in plants based on high-throughput RNA sequencing.

    PubMed

    Yongming, Liu; Ling, Zhang; Tao, Qiu; Zhuofan, Zhao; Moju, Cao

    2016-08-01

    Male sterility is defined as failing to produce functional pollen during stamen development in plants, and it plays a crucial role in plant reproductive research and hybrid seed production in utilization of crop heterosis. High throughput RNA sequencing (RNA-seq) has been used widely in the study of different fields of life science, as it readily detects all the mRNA and non-coding RNA in cells. Recently, RNA-seq has been reported to be applied in different species and kinds of pollen abortion types in plants, which has contributed to the understanding of the molecular mechanism and metabolic networks of male sterility at the transcription level. In this review, we summarize research progress on the mechanisms of male sterility in plants, focusing on RNA-seq analysis encompassing strategies of RNA library construction, differentially expressed genes and functional characteristics of noncoding RNAs involved in stamen abortion. Furthermore, we also discuss application of transcriptome sequencing technology to elucidate pollen abortion mechanisms and map fertility-related genes. We hope to provide references to the study of male sterility in plants.

  18. Combined sequencing of mRNA and DNA from human embryonic stem cells.

    PubMed

    Mertes, Florian; Kuhl, Heiner; Wruck, Wasco; Lehrach, Hans; Adjaye, James

    2016-06-01

    Combined transcriptome and whole genome sequencing of the same ultra-low input sample down to single cells is a rapidly evolving approach for the analysis of rare cells. Besides stem cells, rare cells originating from tissues like tumor or biopsies, circulating tumor cells and cells from early embryonic development are under investigation. Herein we describe a universal method applicable for the analysis of minute amounts of sample material (150 to 200 cells) derived from sub-colony structures from human embryonic stem cells. The protocol comprises the combined isolation and separate amplification of poly(A) mRNA and whole genome DNA followed by next generation sequencing. Here we present a detailed description of the method developed and an overview of the results obtained for RNA and whole genome sequencing of human embryonic stem cells, sequencing data is available in the Gene Expression Omnibus (GEO) database under accession number GSE69471.

  19. Identification of extracellular miRNA in archived serum samples by next-generation sequencing from RNA extracted using multiple methods.

    PubMed

    Gautam, Aarti; Kumar, Raina; Dimitrov, George; Hoke, Allison; Hammamieh, Rasha; Jett, Marti

    2016-10-01

    miRNAs act as important regulators of gene expression by promoting mRNA degradation or by attenuating protein translation. Since miRNAs are stably expressed in bodily fluids, there is growing interest in profiling these miRNAs, as it is minimally invasive and cost-effective as a diagnostic matrix. A technical hurdle in studying miRNA dynamics is the ability to reliably extract miRNA as small sample volumes and low RNA abundance create challenges for extraction and downstream applications. The purpose of this study was to develop a pipeline for the recovery of miRNA using small volumes of archived serum samples. The RNA was extracted employing several widely utilized RNA isolation kits/methods with and without addition of a carrier. The small RNA library preparation was carried out using Illumina TruSeq small RNA kit and sequencing was carried out using Illumina platform. A fraction of five microliters of total RNA was used for library preparation as quantification is below the detection limit. We were able to profile miRNA levels in serum from all the methods tested. We found out that addition of nucleic acid based carrier molecules had higher numbers of processed reads but it did not enhance the mapping of any miRBase annotated sequences. However, some of the extraction procedures offer certain advantages: RNA extracted by TRIzol seemed to align to the miRBase best; extractions using TRIzol with carrier yielded higher miRNA-to-small RNA ratios. Nuclease free glycogen can be carrier of choice for miRNA sequencing. Our findings illustrate that miRNA extraction and quantification is influenced by the choice of methodologies. Addition of nucleic acid- based carrier molecules during extraction procedure is not a good choice when assaying miRNA using sequencing. The careful selection of an extraction method permits the archived serum samples to become valuable resources for high-throughput applications.

  20. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

    PubMed Central

    2013-01-01

    Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected

  1. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns.

    PubMed

    Meyer, Fernando; Kurtz, Stefan; Beckstette, Michael

    2013-07-17

    It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented

  2. Targeted RNA-Sequencing with Competitive Multiplex-PCR Amplicon Libraries

    PubMed Central

    Blomquist, Thomas M.; Crawford, Erin L.; Lovett, Jennie L.; Yeo, Jiyoun; Stanoszek, Lauren M.; Levin, Albert; Li, Jia; Lu, Mei; Shi, Leming; Muldrew, Kenneth; Willey, James C.

    2013-01-01

    Whole transcriptome RNA-sequencing is a powerful tool, but is costly and yields complex data sets that limit its utility in molecular diagnostic testing. A targeted quantitative RNA-sequencing method that is reproducible and reduces the number of sequencing reads required to measure transcripts over the full range of expression would be better suited to diagnostic testing. Toward this goal, we developed a competitive multiplex PCR-based amplicon sequencing library preparation method that a) targets only the sequences of interest and b) controls for inter-target variation in PCR amplification during library preparation by measuring each transcript native template relative to a known number of synthetic competitive template internal standard copies. To determine the utility of this method, we intentionally selected PCR conditions that would cause transcript amplification products (amplicons) to converge toward equimolar concentrations (normalization) during library preparation. We then tested whether this approach would enable accurate and reproducible quantification of each transcript across multiple library preparations, and at the same time reduce (through normalization) total sequencing reads required for quantification of transcript targets across a large range of expression. We demonstrate excellent reproducibility (R2 = 0.997) with 97% accuracy to detect 2-fold change using External RNA Controls Consortium (ERCC) reference materials; high inter-day, inter-site and inter-library concordance (R2 = 0.97–0.99) using FDA Sequencing Quality Control (SEQC) reference materials; and cross-platform concordance with both TaqMan qPCR (R2 = 0.96) and whole transcriptome RNA-sequencing following “traditional” library preparation using Illumina NGS kits (R2 = 0.94). Using this method, sequencing reads required to accurately quantify more than 100 targeted transcripts expressed over a 107-fold range was reduced more than 10,000-fold, from 2.3×109 to 1

  3. High-Throughput Sequencing of RNA Silencing-Associated Small RNAs in Olive (Olea europaea L.)

    PubMed Central

    Donaire, Livia; Pedrola, Laia; de la Rosa, Raúl; Llave, César

    2011-01-01

    Small RNAs (sRNAs) of 20 to 25 nucleotides (nt) in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.). sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA) regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive. PMID:22140484

  4. Different organisms associated with heartwater as shown by analysis of 16S ribosomal RNA gene sequences.

    PubMed

    Allsopp, M; Visser, E S; du Plessis, J L; Vogel, S W; Allsopp, B A

    1997-08-01

    Cowdria ruminantium is a rickettsial parasite which causes heartwater, a economically important disease of domestic and wild ruminants in tropical and subtropical Africa and parts of the Caribbean. Because existing diagnostic methods are unreliable, we investigated the small-subunit ribosomal RNA (srRNA) gene from heartwater-infected material to characterise the organisms present and to develop specific oligonucleotide probes for polymerase chain reaction (PCR) based diagnosis. DNA was obtained from ticks and ruminants from heartwater-free and heartwater-endemic areas from Cowdria in tissue culture. PCR was carried out using primers designed to amplify only rickettsial srRNA genes, the target region being the highly variable V1 loop. Amplicons were cloned and sequenced; 51% were C. ruminantium sequences corresponding to four genotypes, two of which were identical to previously reported C. ruminantium sequences while the other two were new. The four different Cowdria genotypes can be correlated with different phenotypes. Tissue-culture samples yielded only Cowdria genotype sequences, but an extraordinary heterogeneity of 16S sequences was obtained from field samples. In addition to Cowdria genotypes we found sequences from previously unknown Ehrlichia spp., sequences showing homology to other Rickettsiales and a variety of Pseudomonadaceae. One Ehrlichia sequence was phylogenetically closely related to Ehrlichia platys (Group II Ehrlichia) and one to Ehrlichia canis (Group III Ehrlichia). This latter sequence was from an isolate (Germishuys) made from a naturally infected sheep which, from brain smear examination and pathology, appeared to be suffering from heartwater; nevertheless no Cowdria genotype sequences were found in this isolate. In addition no Cowdria sequences were obtained from uninfected ticks. Complete 16S rRNA gene sequences were determined for two C. ruminantium genotypes and for two previously uncharacterised heartwater-associated Ehrlichia spp

  5. Identification of genetic variation between obligate plant pathogens Psuedoperonospora cubensis and P. humuli using RNA sequencing and genotyping-by-sequencing

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-seq) and genotyping-by-sequencing (GBS) were used for single nucleotide polymorphism (SNP) identification from two economically important obligate plant pathogens, Pseudoperonospora cubensis and P. humuli. Twenty isolates of P. cubensis and 19 isolates of P. humuli were genotyped...

  6. Identification and characterization of novel microRNA candidates from deep sequencing.

    PubMed

    Wu, Qian; Wang, Chao; Guo, Li; Ge, Qinyu; Lu, Zuhong

    2013-01-16

    In our previous study, we screened a candidate new microRNA (miRNA) based on the deep sequencing and bioinformatics analysis. In this paper, we evaluated the novel miRNA in the following experiment: 1) the secondary structure of the precursor of novel-miR has the characteristic of a stem-loop hairpin structure, and mature miRNA is far from loops and bulges. 2) we used BLAST (Basic Local Alignment Search Tool) to compare the novel-miR sequence to that found in the GenBank. Novel-miR sequence existed in Mus musculus, Drosophila grimshawi, Rattus norvegicus, Xenopus laevis, Spodoptera frugiperda, Papio anubis, Salmo salar and so on. Then multiple sequence alignment (MSA) showed that sequence from 5 to 11 bp and 13 to 17 bp exhibited 100% similarity, where there is significant sequence conservation. Novel-miR showed similarity in the seed region with the known miR-3675-3p, indicating that these miRNAs are likely to belong to the same family and thus may share common biology. 3) novel-miR from MCF-7 and MB-MDA-231 was validated by Northern blot and detected in the serum and tissue samples of BC patients, respectively, by real-time PCR. The data showed that novel-miR was downregulated in the BC cancerous tissues and serum of breast cancer patients (P<0.05). 4) transfection of novel-miR mimics into MCF-7 cell significantly inhibited cell growth detected by CCK-8 assay (P<0.05). 5) to identify the mRNA targets of novel-miR, we performed a computational screen for genes with novel-miR complementary sites in their 3'-UTR using several open access databases. In addition, we used the CapitalBio® Molecule Annotation System V3.0 to perform gene ontology (GO) analysis on the target genes of novel-miR and specific biological process categories were enriched. 7 genes (CUL3, KRAS, ETS1, MNT, CNTN3, CCNK and FOXO3) which have a high prediction score and are associated with cell proliferation, apoptosis and cell cycle were chosen. 3'-UTR luciferase report assay suggested that miR-BS1

  7. Evolution of the plastid ribosomal RNA operon in a nongreen parasitic plant: accelerated sequence evolution, altered promoter structure, and tRNA pseudogenes.

    PubMed

    Wolfe, K H; Katz-Downie, D S; Morden, C W; Palmer, J D

    1992-04-01

    The nucleotide sequence of a 7.4 kb region containing the entire plastid ribosomal RNA operon of the nongreen parasitic plant Epifagus virginiana has been determined. Analysis of the sequence indicates that all four rRNA genes are intact and almost certainly functional. In contrast, the split genes for tRNA(Ile) and tRNA(Ala) present in the 16S-23S rRNA spacer region have become pseudogenes, and deletion upstream of the 16S rRNA gene has removed a tRNA(Val) gene and most of the promoter region for the rRNA operon. The rate of nucleotide substitution in 16S and 23S rRNAs is several times higher in Epifagus than in tobacco, a related photosynthetic plant. Possible reasons for this, including relaxed translational constraints, are discussed.

  8. Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls.

    PubMed

    Tong, Li; Yang, Cheng; Wu, Po-Yen; Wang, May D

    2016-02-01

    Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation. After aligning raw and corrected RNA-seq data, we characterized the quality of reads by three metrics: mismatch patterns (i.e., the substitution rate of A to C) of reads aligned with one mismatch, mismatch patterns of reads aligned with two mismatches and the percentage increase of reads aligned to reference. We observed that the mismatch patterns for reads aligned with one mismatch are significantly correlated between ERCC spike-ins and real RNA samples. Based on such observations, we conclude that ERCC spike-ins can serve as ground truths for error correction beyond their previous applications for validation of dynamic range and fold-change response. Also, the mismatch patterns for ERCC reads aligned with one mismatch can serve as a novel and reliable metric to evaluate the performance of error-correction tools.

  9. Improved identification of Gordonia, Rhodococcus and Tsukamurella species by 5'-end 16S rRNA gene sequencing.

    PubMed

    Wang, Tao; Kong, Fanrong; Chen, Sharon; Xiao, Meng; Sorrell, Tania; Wang, Xiaoyan; Wang, Shuo; Sintchenko, Vitali

    2011-01-01

    The identification of fastidious aerobic Actinomycetes such as Gordonia, Rhodococcus, and Tsukamurella has remained a challenge leading to clinically significant misclassifications. This study is intended to examine the feasibility of partial 5'-end 16S rRNA gene sequencing for the identification of Gordonia, Rhodococcus, and Tsukamurella, and defined potential reference sequences for species from each of these genera. The 16S rRNA gene sequence based identification algorithm for species identification was used and enhanced by aligning test sequences with reference sequences from the List of Prokaryotic Names with Standing in Nomenclature. Conventional PCR based 16S rRNA gene sequencing and the alignment of the isolate 16S rRNA gene sequence with reference sequences accurately identified 100% of clinical strains of aerobic Actinomycetes. While partial 16S rRNA gene sequences of reference type strains matched with the 16S rRNA gene sequences of 19 isolates in our data set, another 13 strains demonstrated a degree of polymorphism with a 1-4 bp difference in the regions of difference. 5'-end 606 bp 16S rRNA gene sequencing, coupled with the assignment of well defined reference sequences to clinically relevant species of bacteria, can be a useful strategy for improving the identification of clinically relevant aerobic Actinomycetes.

  10. Phylogenetic analysis of Mexican Babesia bovis isolates using msa and ssrRNA gene sequences.

    PubMed

    Genis, Alma D; Mosqueda, Juan J; Borgonio, Verónica M; Falcón, Alfonso; Alvarez, Antonio; Camacho, Minerva; de Lourdes Muñoz, Maria; Figueroa, Julio V

    2008-12-01

    Variable merozoite surface antigens of Babesia bovis are exposed glycoproteins having a role in erythrocyte invasion. Members of this gene family include msa-1 and msa-2 (msa-2c, msa-2a(1), msa-2a(2), and msa-2b). Small subunit ribosomal (ssr)RNA gene is subject to evolutive pressure and has been used in phylogenetic studies. To determine the phylogenetic relationship among B. bovis Mexican isolates using different genetic markers, PCR amplicons, corresponding to msa-1, msa-2c, msa-2b, and ssrRNA genes, were cloned and plasmids carrying the corresponding inserts were sequenced. Comparative analysis of nucleotide and deduced amino acid sequences revealed distinct degrees of variability and identity among the coding gene sequences obtained from 12 geographically different B. bovis isolates and a reference strain. Overall sequence identities of 47.7%, 72.3%, 87.7%, and 94% were determined for msa-1, msa-2b, msa-2c, and ssrRNA, respectively. A robust phylogenetic tree was obtained with msa-2b sequences. The phylogenetic analysis suggests that Mexican B. bovis isolates group in clades not concordant with the Mexican geography. However, the Mexican isolates group together in an American clade separated from the Australian clade. Sequence heterogeneity in msa-1, msa-2b, and msa-2c coding regions of Mexican B. bovis isolates present in different geographical regions can be a result of either differential evolutive pressure or cattle movement from commercial trade.

  11. mirTools: microRNA profiling and discovery based on high-throughput sequencing.

    PubMed

    Zhu, Erle; Zhao, Fangqing; Xu, Gang; Hou, Huabin; Zhou, Linglin; Li, Xiaokun; Sun, Zhongsheng; Wu, Jinyu

    2010-07-01

    miRNAs are small, non-coding RNA that negatively regulate gene expression at post-transcriptional level, which play crucial roles in various physiological and pathological processes, such as development and tumorigenesis. Although deep sequencing technologies have been applied to investigate various small RNA transcriptomes, their computational methods are far away from maturation as compared to microarray-based approaches. In this study, a comprehensive web server mirTools was developed to allow researchers to comprehensively characterize small RNA transcriptome. With the aid of mirTools, users can: (i) filter low-quality reads and 3/5' adapters from raw sequenced data; (ii) align large-scale short reads to the reference genome and explore their length distribution; (iii) classify small RNA candidates into known categories, such as known miRNAs, non-coding RNA, genomic repeats and coding sequences; (iv) provide detailed annotation information for known miRNAs, such as miRNA/miRNA*, absolute/relative reads count and the most abundant tag; (v) predict novel miRNAs that have not been characterized before; and (vi) identify differentially expressed miRNAs between samples based on two different counting strategies: total read tag counts and the most abundant tag counts. We believe that the integration of multiple computational approaches in mirTools will greatly facilitate current microRNA researches in multiple ways. mirTools can be accessed at http://centre.bioinformatics.zj.cn/mirtools/ and http://59.79.168.90/mirtools.

  12. RNA Sequencing Identifies New RNase III Cleavage Sites in Escherichia coli and Reveals Increased Regulation of mRNA

    PubMed Central

    Gordon, Gina C.; Cameron, Jeffrey C.

    2017-01-01

    ABSTRACT Ribonucleases facilitate rapid turnover of RNA, providing cells with another mechanism to adjust transcript and protein levels in response to environmental conditions. While many examples have been documented, a comprehensive list of RNase targets is not available. To address this knowledge gap, we compared levels of RNA sequencing coverage of Escherichia coli and a corresponding RNase III mutant to expand the list of known RNase III targets. RNase III is a widespread endoribonuclease that binds and cleaves double-stranded RNA in many critical transcripts. RNase III cleavage at novel sites found in aceEF, proP, tnaC, dctA, pheM, sdhC, yhhQ, glpT, aceK, and gluQ accelerated RNA decay, consistent with previously described targets wherein RNase III cleavage initiates rapid degradation of secondary messages by other RNases. In contrast, cleavage at three novel sites in the ahpF, pflB, and yajQ transcripts led to stabilized secondary transcripts. Two other novel sites in hisL and pheM overlapped with transcriptional attenuators that likely serve to ensure turnover of these highly structured RNAs. Many of the new RNase III target sites are located on transcripts encoding metabolic enzymes. For instance, two novel RNase III sites are located within transcripts encoding enzymes near a key metabolic node connecting glycolysis and the tricarboxylic acid (TCA) cycle. Pyruvate dehydrogenase activity was increased in an rnc deletion mutant compared to the wild-type (WT) strain in early stationary phase, confirming the novel link between RNA turnover and regulation of pathway activity. Identification of these novel sites suggests that mRNA turnover may be an underappreciated mode of regulating metabolism. PMID:28351917

  13. RNA deep sequencing reveals differential microRNA expression during development of sea urchin and sea star.

    PubMed

    Kadri, Sabah; Hinman, Veronica F; Benos, Panayiotis V

    2011-01-01

    microRNAs (miRNAs) are small (20-23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html.

  14. RNA Deep Sequencing Reveals Differential MicroRNA Expression during Development of Sea Urchin and Sea Star

    PubMed Central

    Kadri, Sabah; Hinman, Veronica F.; Benos, Panayiotis V.

    2011-01-01

    microRNAs (miRNAs) are small (20–23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html. PMID:22216218

  15. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    NASA Technical Reports Server (NTRS)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  16. Structural annotation of equine protein-coding genes determined by mRNA sequencing.

    PubMed

    Coleman, S J; Zeng, Z; Wang, K; Luo, S; Khrebtukova, I; Mienaltowski, M J; Schroth, G P; Liu, J; MacLeod, J N

    2010-12-01

    The horse, like the majority of animal species, has a limited amount of species-specific expressed sequence data available in public databases. As a result, structural models for the majority of genes defined in the equine genome are predictions based on ab initio sequence analysis or the projection of gene structures from other mammalian species. The current study used Illumina-based sequencing of messenger RNA (RNA-seq) to help refine structural annotation of equine protein-coding genes and for a preliminary assessment of gene expression patterns. Sequencing of mRNA from eight equine tissues generated 293,758105 sequence tags of 35 bases each, equalling 10.28 gbp of total sequence data. The tag alignments represent approximately 207 × coverage of the equine mRNA transcriptome and confirmed transcriptional activity for roughly 90% of the protein-coding gene structures predicted by Ensembl and NCBI. Tag coverage was sufficient to refine the structural annotation for 11,356 of these predicted genes, while also identifying an additional 456 transcripts with exon/intron features that are not listed by either Ensembl or NCBI. Genomic locus data and intervals for the protein-coding genes predicted by the Ensembl and NCBI annotation pipelines were combined with 75,116 RNA-seq-derived transcriptional units to generate a consensus equine protein-coding gene set of 20,302 defined loci. Gene ontology annotation was used to compare the functional and structural categories of genes expressed in either a tissue-restricted pattern or broadly across all tissue samples. © 2010 The Authors, Journal compilation © 2010 Stichting International Foundation for Animal Genetics.

  17. Sequence-specific bias correction for RNA-seq data using recurrent neural networks.

    PubMed

    Zhang, Yao-Zhong; Yamaguchi, Rui; Imoto, Seiya; Miyano, Satoru

    2017-01-25

    The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures.

  18. Identification of microRNAs by small RNA deep sequencing for synthetic microRNA mimics to control Spodoptera exigua.

    PubMed

    Zhang, Yu Liang; Huang, Qi Xing; Yin, Guo Hua; Lee, Samantha; Jia, Rui Zong; Liu, Zhi Xin; Yu, Nai Tong; Pennerman, Kayla K; Chen, Xin; Guo, An Ping

    2015-02-25

    Beet armyworm, Spodoptera exigua, is a major pest of cotton around the world. With the increase of resistance to Bacillus thuringiensis (Bt) toxin in transgenic cotton plants, there is a need to develop an alternative control approach that can be used in combination with Bt transgenic crops as part of resistance management strategies. MicroRNAs (miRNAs), a non-coding small RNA family (18-25 nt), play crucial roles in various biological processes and over-expression of miRNAs has been shown to interfere with the normal development of insects. In this study, we identified 127 conserved miRNAs in S. exigua by using small RNA deep sequencing technology. From this, we tested the effects of 11 miRNAs on larval development. We found three miRNAs, Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9, to be differentially expressed during larval stages of S. exigua. Oral feeding experiments using synthetic miRNA mimics of Sex-miR-10-1a, Sex-miR-4924, and Sex-miR-9 resulted in suppressed growth of S. exigua and mortality. Over-expression of Sex-miR-4924 caused a significant reduction in the expression level of chitinase 1 and caused abortive molting in the insects. Therefore, we demonstrated a novel approach of using miRNA mimics to control S. exigua development.

  19. YM500v3: a database for small RNA sequencing in human cancer research

    PubMed Central

    Chung, I-Fang; Chang, Shing-Jyh; Chen, Chen-Yang; Liu, Shu-Hsuan; Li, Chia-Yang; Chan, Chia-Hao; Shih, Chuan-Chi; Cheng, Wei-Chung

    2017-01-01

    We previously presented the YM500 database, which contains >8000 small RNA sequencing (smRNA-seq) data sets and integrated analysis results for various cancer miRNome studies. In the updated YM500v3 database (http://ngs.ym.edu.tw/ym500/) presented herein, we not only focus on miRNAs but also on other functional small non-coding RNAs (sncRNAs), such as PIWI-interacting RNAs (piRNAs), tRNA-derived fragments (tRFs), small nuclear RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). There is growing knowledge of the role of sncRNAs in gene regulation and tumorigenesis. We have also incorporated >10 000 cancer-related RNA-seq and >3000 more smRNA-seq data sets into the YM500v3 database. Furthermore, there are two main new sections, ‘Survival' and ‘Cancer', in this updated version. The ‘Survival’ section provides the survival analysis results in all cancer types or in a user-defined group of samples for a specific sncRNA. The ‘Cancer’ section provides the results of differential expression analyses, miRNA–gene interactions and cancer miRNA-related pathways. In the ‘Expression’ section, sncRNA expression profiles across cancer and sample types are newly provided. Cancer-related sncRNAs hold potential for both biotech applications and basic research. PMID:27899625

  20. NS3 helicase actively separates RNA strands and senses sequence barriers ahead of the opening fork

    PubMed Central

    Cheng, Wei; Dumont, Sophie; Tinoco, Ignacio; Bustamante, Carlos

    2007-01-01

    RNA helicases regulate virtually all RNA-dependent cellular processes. Although much is known about helicase structures, very little is known about how they deal with barriers in RNA and the factors that affect their processivity. The hepatitis C virus encodes NS3, an RNA helicase that is essential for viral RNA replication. We have used optical tweezers to determine at the single-molecule level how the local stability of the RNA substrate affects the enzyme rate of strand separation, whether separation occurs by an active or a passive mechanism, and whether processivity is affected. We show that sequence barriers in RNA modulate NS3 activity. NS3 processivity depends on barriers ahead of the opening fork. Our results rule out a model where NS3 passively waits for the thermal fraying of double-stranded RNA. Instead, we find that NS3 destabilizes the duplex before separating the strands. Failure to do so before a strong barrier leads to helicase dissociation and limits the processivity of the enzyme. PMID:17709749

  1. The impact of CRISPR repeat sequence on structures of a Cas6 protein-RNA complex

    SciTech Connect

    Wang, Ruiying; Zheng, Han; Preamplume, Gan; Shao, Yaming; Li, Hong

    2012-03-15

    The repeat-associated mysterious proteins (RAMPs) comprise the most abundant family of proteins involved in prokaryotic immunity against invading genetic elements conferred by the clustered regularly interspaced short palindromic repeat (CRISPR) system. Cas6 is one of the first characterized RAMP proteins and is a key enzyme required for CRISPR RNA maturation. Despite a strong structural homology with other RAMP proteins that bind hairpin RNA, Cas6 distinctly recognizes single-stranded RNA. Previous structural and biochemical studies show that Cas6 captures the 5' end while cleaving the 3' end of the CRISPR RNA. Here, we describe three structures and complementary biochemical analysis of a noncatalytic Cas6 homolog from Pyrococcus horikoshii bound to CRISPR repeat RNA of different sequences. Our study confirms the specificity of the Cas6 protein for single-stranded RNA and further reveals the importance of the bases at Positions 5-7 in Cas6-RNA interactions. Substitutions of these bases result in structural changes in the protein-RNA complex including its oligomerization state.

  2. Messenger RNA sequence and the translation process --a particle transport perspective

    NASA Astrophysics Data System (ADS)

    Dong, Jiajia; Schmittmann, Beate; Zia, Royce K. P.

    2008-03-01

    The translation process in bacteria has been under intensive study. A key question concerns the quantitative effect of different elongation rates, associated with different codons, on the overall translation efficiency. Starting with a simple particle transport model, the totally asymmetric simple exclusion process (TASEP), we incorporate the essential components of the translation process: Ribosomes, cognate tRNA concentrations, and messenger RNA (mRNA) templates correspond to particles, hopping rates, and the underlying lattice, respectively. Using simulations and mean-field approximations to obtain the stationary currents (the protein production rates) associated with different mRNA sequences, we are especially interested in the effect of slow codons, i.e., codons which are associated with rare tRNAs and are therefore translated very slowly. As the first step, we look at a ``designed sequence'' with one and two slow codons and quantify the marked impact of their spatial distribution to the currents. Extending the results to several mRNA sequences taken from real genes, we argue that an effective translation rate including the information from the vicinity of each codon needs to be taken into consideration when seeking an efficient strategy to optimize the protein production.

  3. Small RNA Sequencing Based Identification of MiRNAs in Daphnia magna.

    PubMed

    Ünlü, Ercan Selçuk; Gordon, Donna M; Telli, Murat

    2015-01-01

    Small RNA molecules are short, non-coding RNAs identified for their crucial role in post-transcriptional regulation. A well-studied example includes miRNAs (microRNAs) which have been identified in several model organisms including the freshwater flea and planktonic crustacean Daphnia. A model for epigenetic-based studies with an available genome database, the identification of miRNAs and their potential role in regulating Daphnia gene expression has only recently garnered interest. Computational-based work using Daphnia pulex, has indicated the existence of 45 miRNAs, 14 of which have been experimentally verified. To extend this study, we took a sequencing approach towards identifying miRNAs present in a small RNA library isolated from Daphnia magna. Using Perl codes designed for comparative genomic analysis, 815,699 reads were obtained from 4 million raw reads and run against a database file of known miRNA sequences. Using this approach, we have identified 205 putative mature miRNA sequences belonging to 188 distinct miRNA families. Data from this study provides critical information necessary to begin an investigation into a role for these transcripts in the epigenetic regulation of Daphnia magna.

  4. RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data.

    PubMed

    Sun, Wen-Ju; Li, Jun-Hao; Liu, Shun; Wu, Jie; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

    2016-01-04

    Although more than 100 different types of RNA modifications have been characterized across all living organisms, surprisingly little is known about the modified positions and their functions. Recently, various high-throughput modification sequencing methods have been developed to identify diverse post-transcriptional modifications of RNA molecules. In this study, we developed a novel resource, RMBase (RNA Modification Base, http://mirlab.sysu.edu.cn/rmbase/), to decode the genome-wide landscape of RNA modifications identified from high-throughput modification data generated by 18 independent studies. The current release of RMBase includes ∼ 9500 pseudouridine (Ψ) modifications generated from Pseudo-seq and CeU-seq sequencing data, ∼ 1000 5-methylcytosines (m(5)C) predicted from Aza-IP data, ∼ 124 200 N6-Methyladenosine (m(6)A) modifications discovered from m(6)A-seq and ∼ 1210 2'-O-methylations (2'-O-Me) identified from RiboMeth-seq data and public resources. Moreover, RMBase provides a comprehensive listing of other experimentally supported types of RNA modifications by integrating various resources. It provides web interfaces to show thousands of relationships between RNA modification sites and microRNA target sites. It can also be used to illustrate the disease-related SNPs residing in the modification sites/regions. RMBase provides a genome browser and a web-based modTool to query, annotate and visualize various RNA modifications. This database will help expand our understanding of potential functions of RNA modifications. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. LNCipedia: a database for annotated human lncRNA transcript sequences and structures

    PubMed Central

    Volders, Pieter-Jan; Helsens, Kenny; Wang, Xiaowei; Menten, Björn; Martens, Lennart; Gevaert, Kris; Vandesompele, Jo; Mestdagh, Pieter

    2013-01-01

    Here, we present LNCipedia (http://www.lncipedia.org), a novel database for human long non-coding RNA (lncRNA) transcripts and genes. LncRNAs constitute a large and diverse class of non-coding RNA genes. Although several lncRNAs have been functionally annotated, the majority remains to be characterized. Different high-throughput methods to identify new lncRNAs (including RNA sequencing and annotation of chromatin-state maps) have been applied in various studies resulting in multiple unrelated lncRNA data sets. LNCipedia offers 21 488 annotated human lncRNA transcripts obtained from different sources. In addition to basic transcript information and gene structure, several statistics are determined for each entry in the database, such as secondary structure information, protein coding potential and microRNA binding sites. Our analyses suggest that, much like microRNAs, many lncRNAs have a significant secondary structure, in-line with their presumed association with proteins or protein complexes. Available literature on specific lncRNAs is linked, and users or authors can submit articles through a web interface. Protein coding potential is assessed by two different prediction algorithms: Coding Potential Calculator and HMMER. In addition, a novel strategy has been integrated for detecting potentially coding lncRNAs by automatically re-analysing the large body of publicly available mass spectrometry data in the PRIDE database. LNCipedia is publicly available and allows users to query and download lncRNA sequences and structures based on different search criteria. The database may serve as a resource to initiate small- and large-scale lncRNA studies. As an example, the LNCipedia content was used to develop a custom microarray for expression profiling of all available lncRNAs. PMID:23042674

  6. RNA sequencing (RNA-Seq) of lymph node, spleen, and thymus transcriptome from wild Peninsular Malaysian cynomolgus macaque (Macaca fascicularis)

    PubMed Central

    Ee Uli, Joey; Yong, Christina Seok Yien; Yeap, Swee Keong; Rovie-Ryan, Jeffrine J.; Mat Isa, Nurulfiza; Tan, Soon Guan

    2017-01-01

    The cynomolgus macaque (Macaca fascicularis) is an extensively utilised nonhuman primate model for biomedical research due to its biological, behavioural, and genetic similarities to humans. Genomic information of cynomolgus macaque is vital for research in various fields; however, there is presently a shortage of genomic information on the Malaysian cynomolgus macaque. This study aimed to sequence, assemble, annotate, and profile the Peninsular Malaysian cynomolgus macaque transcriptome derived from three tissues (lymph node, spleen, and thymus) using RNA sequencing (RNA-Seq) technology. A total of 174,208,078 paired end 70 base pair sequencing reads were obtained from the Illumina Hi-Seq 2500 sequencer. The overall mapping percentage of the sequencing reads to the M. fascicularis reference genome ranged from 53–63%. Categorisation of expressed genes to Gene Ontology (GO) and KEGG pathway categories revealed that GO terms with the highest number of associated expressed genes include Cellular process, Catalytic activity, and Cell part, while for pathway categorisation, the majority of expressed genes in lymph node, spleen, and thymus fall under the Global overview and maps pathway category, while 266, 221, and 138 genes from lymph node, spleen, and thymus were respectively enriched in the Immune system category. Enriched Immune system pathways include Platelet activation pathway, Antigen processing and presentation, B cell receptor signalling pathway, and Intestinal immune network for IgA production. Differential gene expression analysis among the three tissues revealed 574 differentially expressed genes (DEG) between lymph and spleen, 5402 DEGs between lymph and thymus, and 7008 DEGs between spleen and thymus. Venn diagram analysis of expressed genes revealed a total of 2,630, 253, and 279 tissue-specific genes respectively for lymph node, spleen, and thymus tissues. This is the first time the lymph node, spleen, and thymus transcriptome of the Peninsular

  7. RNA sequencing (RNA-Seq) of lymph node, spleen, and thymus transcriptome from wild Peninsular Malaysian cynomolgus macaque (Macaca fascicularis).

    PubMed

    Ee Uli, Joey; Yong, Christina Seok Yien; Yeap, Swee Keong; Rovie-Ryan, Jeffrine J; Mat Isa, Nurulfiza; Tan, Soon Guan; Alitheen, Noorjahan Banu

    2017-01-01

    The cynomolgus macaque (Macaca fascicularis) is an extensively utilised nonhuman primate model for biomedical research due to its biological, behavioural, and genetic similarities to humans. Genomic information of cynomolgus macaque is vital for research in various fields; however, there is presently a shortage of genomic information on the Malaysian cynomolgus macaque. This study aimed to sequence, assemble, annotate, and profile the Peninsular Malaysian cynomolgus macaque transc