Sample records for massive transcriptome sequencing

  1. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  2. Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease

    DTIC Science & Technology

    2016-10-01

    deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3

  3. De novo characterization of Larimichthys crocea transcriptome for growth-/immune-related gene identification and massive microsatellite (SSR) marker development

    NASA Astrophysics Data System (ADS)

    Han, Zhaofang; Xiao, Shijun; Liu, Xiande; Liu, Yang; Li, Jiakai; Xie, Yangjie; Wang, Zhiyong

    2017-03-01

    The large yellow croaker, Larimichthys crocea is an important marine fish in China with a high economic value. In the last decade, the stock conservation and aquaculture industry of this species have been facing severe challenges because of wild population collapse and degeneration of important economic traits. However, genes contributing to growth and immunity in L. crocea have not been thoroughly analyzed, and available molecular markers are still not sufficient for genetic resource management and molecular selection. In this work, we sequenced the transcriptome in L. crocea liver tissue with a Roche 454 sequencing platform and assembled the transcriptome into 93 801 transcripts. Of them, 38 856 transcripts were successfully annotated in nt, nr, Swiss-Prot, InterPro, COG, GO and KEGG databases. Based on the annotation information, 3 165 unigenes related to growth and immunity were identified. Additionally, a total of 6 391 simple sequence repeats (SSRs) were identified from the transcriptome, among which 4 498 SSRs had enough flanking regions to design primers for polymerase chain reactions (PCR). To access the polymorphism of these markers, 30 primer pairs were randomly selected for PCR amplification and validation in 30 individuals, and 12 primer pairs (40.0%) exhibited obvious length polymorphisms. This work applied RNA-Seq to assemble and analyze a live transcriptome in L. crocea. With gene annotation and sequence information, genes related to growth and immunity were identified and massive SSR markers were developed, providing valuable genetic resources for future gene functional analysis and selective breeding of L. crocea.

  4. Genomic Resources Notes Accepted 1 June 2015-31 July 2015.

    PubMed

    Álvarez, P; Arthofer, Wolfgang; Coelho, Maria M; Conklin, D; Estonba, A; Grosso, Ana R; Helyar, S J; Langa, J; Machado, Miguel P; Montes, I; Pinho, Joana; Rief, Alexander; Schartl, Manfred; Schlick-Steiner, Birgit C; Seeber, Julia; Steiner, Florian M; Vilas, C

    2015-11-01

    This article documents the public availability of (i) microbiomes in diet and gut of larvae from the dipteran Dilophus febrilis using massive parallel sequencing, (ii) SNP and SSR discovery and characterization in the transcriptome of the Atlantic mackerel (Scomber scombrus, L) and (iii) assembled transcriptome for an endangered, endemic Iberian cyprinid fish (Squalius pyrenaicus). © 2015 John Wiley & Sons Ltd.

  5. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    PubMed Central

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms. PMID:21791039

  6. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data.

    PubMed

    Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma

    2017-02-22

    In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .

  7. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    PubMed Central

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818

  8. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    PubMed

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  9. Comparative transcriptomics of early dipteran development

    PubMed Central

    2013-01-01

    Background Modern sequencing technologies have massively increased the amount of data available for comparative genomics. Whole-transcriptome shotgun sequencing (RNA-seq) provides a powerful basis for comparative studies. In particular, this approach holds great promise for emerging model species in fields such as evolutionary developmental biology (evo-devo). Results We have sequenced early embryonic transcriptomes of two non-drosophilid dipteran species: the moth midge Clogmia albipunctata, and the scuttle fly Megaselia abdita. Our analysis includes a third, published, transcriptome for the hoverfly Episyrphus balteatus. These emerging models for comparative developmental studies close an important phylogenetic gap between Drosophila melanogaster and other insect model systems. In this paper, we provide a comparative analysis of early embryonic transcriptomes across species, and use our data for a phylogenomic re-evaluation of dipteran phylogenetic relationships. Conclusions We show how comparative transcriptomics can be used to create useful resources for evo-devo, and to investigate phylogenetic relationships. Our results demonstrate that de novo assembly of short (Illumina) reads yields high-quality, high-coverage transcriptomic data sets. We use these data to investigate deep dipteran phylogenetic relationships. Our results, based on a concatenation of 160 orthologous genes, provide support for the traditional view of Clogmia being the sister group of Brachycera (Megaselia, Episyrphus, Drosophila), rather than that of Culicomorpha (which includes mosquitoes and blackflies). PMID:23432914

  10. GigaTON: an extensive publicly searchable database providing a new reference transcriptome in the pacific oyster Crassostrea gigas.

    PubMed

    Riviere, Guillaume; Klopp, Christophe; Ibouniyamine, Nabihoudine; Huvet, Arnaud; Boudry, Pierre; Favrel, Pascal

    2015-12-02

    The Pacific oyster, Crassostrea gigas, is one of the most important aquaculture shellfish resources worldwide. Important efforts have been undertaken towards a better knowledge of its genome and transcriptome, which makes now C. gigas becoming a model organism among lophotrochozoans, the under-described sister clade of ecdysozoans within protostomes. These massive sequencing efforts offer the opportunity to assemble gene expression data and make such resource accessible and exploitable for the scientific community. Therefore, we undertook this assembly into an up-to-date publicly available transcriptome database: the GigaTON (Gigas TranscriptOme pipeliNe) database. We assembled 2204 million sequences obtained from 114 publicly available RNA-seq libraries that were realized using all embryo-larval development stages, adult organs, different environmental stressors including heavy metals, temperature, salinity and exposure to air, which were mostly performed as part of the Crassostrea gigas genome project. This data was analyzed in silico and resulted into 56621 newly assembled contigs that were deposited into a publicly available database, the GigaTON database. This database also provides powerful and user-friendly request tools to browse and retrieve information about annotation, expression level, UTRs, splice and polymorphism, and gene ontology associated to all the contigs into each, and between all libraries. The GigaTON database provides a convenient, potent and versatile interface to browse, retrieve, confront and compare massive transcriptomic information in an extensive range of conditions, tissues and developmental stages in Crassostrea gigas. To our knowledge, the GigaTON database constitutes the most extensive transcriptomic database to date in marine invertebrates, thereby a new reference transcriptome in the oyster, a highly valuable resource to physiologists and evolutionary biologists.

  11. De Novo Transcriptome of the Hemimetabolous German Cockroach (Blattella germanica)

    PubMed Central

    Zhou, Xiaojie; Qian, Kun; Tong, Ying; Zhu, Junwei Jerry; Qiu, Xinghui; Zeng, Xiaopeng

    2014-01-01

    Background The German cockroach, Blattella germanica, is an important insect pest that transmits various pathogens mechanically and causes severe allergic diseases. This insect has long served as a model system for studies of insect biology, physiology and ecology. However, the lack of genome or transcriptome information heavily hinder our further understanding about the German cockroach in every aspect at a molecular level and on a genome-wide scale. To explore the transcriptome and identify unique sequences of interest, we subjected the B. germanica transcriptome to massively parallel pyrosequencing and generated the first reference transcriptome for B. germanica. Methodology/Principal Findings A total of 1,365,609 raw reads with an average length of 529 bp were generated via pyrosequencing the mixed cDNA library from different life stages of German cockroach including maturing oothecae, nymphs, adult females and males. The raw reads were de novo assembled to 48,800 contigs and 3,961 singletons with high-quality unique sequences. These sequences were annotated and classified functionally in terms of BLAST, GO and KEGG, and the genes putatively coding detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, immunity and chemoreception pathways were identified. A total of 3,601 SSRs (Simple Sequence Repeats) loci were also predicted. Conclusions/Significance The whole transcriptome pyrosequencing data from this study provides a usable genetic resource for future identification of potential functional genes involved in various biological processes. PMID:25265537

  12. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  13. Transcriptome dynamics through alternative polyadenylation in developmental and environmental responses in plants revealed by deep sequencing

    PubMed Central

    Shen, Yingjia; Venu, R.C.; Nobuta, Kan; Wu, Xiaohui; Notibala, Varun; Demirci, Caghan; Meyers, Blake C.; Wang, Guo-Liang; Ji, Guoli; Li, Qingshun Q.

    2011-01-01

    Polyadenylation sites mark the ends of mRNA transcripts. Alternative polyadenylation (APA) may alter sequence elements and/or the coding capacity of transcripts, a mechanism that has been demonstrated to regulate gene expression and transcriptome diversity. To study the role of APA in transcriptome dynamics, we analyzed a large-scale data set of RNA “tags” that signify poly(A) sites and expression levels of mRNA. These tags were derived from a wide range of tissues and developmental stages that were mutated or exposed to environmental treatments, and generated using digital gene expression (DGE)–based protocols of the massively parallel signature sequencing (MPSS-DGE) and the Illumina sequencing-by-synthesis (SBS-DGE) sequencing platforms. The data offer a global view of APA and how it contributes to transcriptome dynamics. Upon analysis of these data, we found that ∼60% of Arabidopsis genes have multiple poly(A) sites. Likewise, ∼47% and 82% of rice genes use APA, supported by MPSS-DGE and SBS-DGE tags, respectively. In both species, ∼49%–66% of APA events were mapped upstream of annotated stop codons. Interestingly, 10% of the transcriptomes are made up of APA transcripts that are differentially distributed among developmental stages and in tissues responding to environmental stresses, providing an additional level of transcriptome dynamics. Examples of pollen-specific APA switching and salicylic acid treatment-specific APA clearly demonstrated such dynamics. The significance of these APAs is more evident in the 3034 genes that have conserved APA events between rice and Arabidopsis. PMID:21813626

  14. Pyrosequencing the Midgut Transcriptome of the Banana Weevil Cosmopolites sordidus (Germar) (Coleoptera: Curculionidae) Reveals Multiple Protease-Like Transcripts.

    PubMed

    Valencia, Arnubio; Wang, Haichuan; Soto, Alberto; Aristizabal, Manuel; Arboleda, Jorge W; Eyun, Seong-Il; Noriega, Daniel D; Siegfried, Blair

    2016-01-01

    The banana weevil Cosmopolites sordidus is an important and serious insect pest in most banana and plantain-growing areas of the world. In spite of the economic importance of this insect pest very little genomic and transcriptomic information exists for this species. In the present study, we characterized the midgut transcriptome of C. sordidus using massive 454-pyrosequencing. We generated over 590,000 sequencing reads that assembled into 30,840 contigs with more than 400 bp, representing a significant expansion of existing sequences available for this insect pest. Among them, 16,427 contigs contained one or more GO terms. In addition, 15,263 contigs were assigned an EC number. In-depth transcriptome analysis identified genes potentially involved in insecticide resistance, peritrophic membrane biosynthesis, immunity-related function and defense against pathogens, and Bacillus thuringiensis toxins binding proteins as well as multiple enzymes involved with protein digestion. This transcriptome will provide a valuable resource for understanding larval physiology and for identifying novel target sites and management approaches for this important insect pest.

  15. Pyrosequencing the Midgut Transcriptome of the Banana Weevil Cosmopolites sordidus (Germar) (Coleoptera: Curculionidae) Reveals Multiple Protease-Like Transcripts

    PubMed Central

    Valencia, Arnubio; Wang, Haichuan; Soto, Alberto; Aristizabal, Manuel; Arboleda, Jorge W.; Eyun, Seong-il; Noriega, Daniel D.; Siegfried, Blair

    2016-01-01

    The banana weevil Cosmopolites sordidus is an important and serious insect pest in most banana and plantain-growing areas of the world. In spite of the economic importance of this insect pest very little genomic and transcriptomic information exists for this species. In the present study, we characterized the midgut transcriptome of C. sordidus using massive 454-pyrosequencing. We generated over 590,000 sequencing reads that assembled into 30,840 contigs with more than 400 bp, representing a significant expansion of existing sequences available for this insect pest. Among them, 16,427 contigs contained one or more GO terms. In addition, 15,263 contigs were assigned an EC number. In-depth transcriptome analysis identified genes potentially involved in insecticide resistance, peritrophic membrane biosynthesis, immunity-related function and defense against pathogens, and Bacillus thuringiensis toxins binding proteins as well as multiple enzymes involved with protein digestion. This transcriptome will provide a valuable resource for understanding larval physiology and for identifying novel target sites and management approaches for this important insect pest. PMID:26949943

  16. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    PubMed

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in response to B. ostreae through massively sequencing and has aided to improve our knowledge of the immune mechanisms of flat oyster. The validated oligo-microarray and the establishment of a reference transcriptome will be useful for large-scale gene expression studies in this species. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Optimization of De Novo Short Read Assembly of Seabuckthorn (Hippophae rhamnoides L.) Transcriptome

    PubMed Central

    Ghangal, Rajesh; Chaudhary, Saurabh; Jain, Mukesh; Purty, Ram Singh; Chand Sharma, Prakash

    2013-01-01

    Seabuckthorn ( Hippophae rhamnoides L.) is known for its medicinal, nutritional and environmental importance since ancient times. However, very limited efforts have been made to characterize the genome and transcriptome of this wonder plant. Here, we report the use of next generation massive parallel sequencing technology (Illumina platform) and de novo assembly to gain a comprehensive view of the seabuckthorn transcriptome. We assembled 86,253,874 high quality short reads using six assembly tools. At our hand, assembly of non-redundant short reads following a two-step procedure was found to be the best considering various assembly quality parameters. Initially, ABySS tool was used following an additive k-mer approach. The assembled transcripts were subsequently subjected to TGICL suite. Finally, de novo short read assembly yielded 88,297 transcripts (> 100 bp), representing about 53 Mb of seabuckthorn transcriptome. The average length of transcripts was 610 bp, N50 length 1198 BP and 91% of the short reads uniquely mapped back to seabuckthorn transcriptome. A total of 41,340 (46.8%) transcripts showed significant similarity with sequences present in nr protein databases of NCBI (E-value < 1E-06). We also screened the assembled transcripts for the presence of transcription factors and simple sequence repeats. Our strategy involving the use of short read assembler (ABySS) followed by TGICL will be useful for the researchers working with a non-model organism’s transcriptome in terms of saving time and reducing complexity in data management. The seabuckthorn transcriptome data generated here provide a valuable resource for gene discovery and development of functional molecular markers. PMID:23991119

  18. Validation of two ribosomal RNA removal methods for microbial metatranscriptomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    He, Shaomei; Wurtzel, Omri; Singh, Kanwar

    2010-10-01

    The predominance of rRNAs in the transcriptome is a major technical challenge in sequence-based analysis of cDNAs from microbial isolates and communities. Several approaches have been applied to deplete rRNAs from (meta)transcriptomes, but no systematic investigation of potential biases introduced by any of these approaches has been reported. Here we validated the effectiveness and fidelity of the two most commonly used approaches, subtractive hybridization and exonuclease digestion, as well as combinations of these treatments, on two synthetic five-microorganism metatranscriptomes using massively parallel sequencing. We found that the effectiveness of rRNA removal was a function of community composition and RNA integritymore » for these treatments. Subtractive hybridization alone introduced the least bias in relative transcript abundance, whereas exonuclease and in particular combined treatments greatly compromised mRNA abundance fidelity. Illumina sequencing itself also can compromise quantitative data analysis by introducing a G+C bias between runs.« less

  19. Transcriptome sequencing and microarray development for the woodrat (Neotoma spp.): custom genetic tools for exploring herbivore ecology.

    PubMed

    Malenke, J R; Milash, B; Miller, A W; Dearing, M D

    2013-07-01

    Massively parallel sequencing has enabled the creation of novel, in-depth genetic tools for nonmodel, ecologically important organisms. We present the de novo transcriptome sequencing, analysis and microarray development for a vertebrate herbivore, the woodrat (Neotoma spp.). This genus is of ecological and evolutionary interest, especially with respect to ingestion and hepatic metabolism of potentially toxic plant secondary compounds. We generated a liver transcriptome of the desert woodrat (Neotoma lepida) using the Roche 454 platform. The assembled contigs were well annotated using rodent references (99.7% annotation), and biotransformation function was reflected in the gene ontology. The transcriptome was used to develop a custom microarray (eArray, Agilent). We tested the microarray with three experiments: one across species with similar habitat (thus, dietary) niches, one across species with different habitat niches and one across populations within a species. The resulting one-colour arrays had high technical and biological quality. Probes designed from the woodrat transcriptome performed significantly better than functionally similar probes from the Norway rat (Rattus norvegicus). There were a multitude of expression differences across the woodrat treatments, many of which related to biotransformation processes and activities. The pattern and function of the differences indicate shared ecological pressures, and not merely phylogenetic distance, play an important role in shaping gene expression profiles of woodrat species and populations. The quality and functionality of the woodrat transcriptome and custom microarray suggest these tools will be valuable for expanding the scope of herbivore biology, as well as the exploration of conceptual topics in ecology. © 2013 John Wiley & Sons Ltd.

  20. Recovering complete mitochondrial genome sequences from RNA-Seq: A case study of Polytomella non-photosynthetic green algae.

    PubMed

    Tian, Yao; Smith, David Roy

    2016-05-01

    Thousands of mitochondrial genomes have been sequenced, but there are comparatively few available mitochondrial transcriptomes. This might soon be changing. High-throughput RNA sequencing (RNA-Seq) techniques have made it fast and cheap to generate massive amounts of mitochondrial transcriptomic data. Here, we explore the utility of RNA-Seq for assembling mitochondrial genomes and studying their expression patterns. Specifically, we investigate the mitochondrial transcriptomes from Polytomella non-photosynthetic green algae, which have among the smallest, most reduced mitochondrial genomes from the Archaeplastida as well as fragmented rRNA-coding regions, palindromic genes, and linear chromosomes with telomeres. Isolation of whole genomic RNA from the four known Polytomella species followed by Illumina paired-end sequencing generated enough mitochondrial-derived reads to easily recover almost-entire mitochondrial genome sequences. Read-mapping and coverage statistics also gave insights into Polytomella mitochondrial transcriptional architecture, revealing polycistronic transcripts and the expression of telomeres and palindromic genes. Ultimately, RNA-Seq is a promising, cost-effective technique for studying mitochondrial genetics, but it does have drawbacks, which are discussed. One of its greatest potentials, as shown here, is that it can be used to generate near-complete mitochondrial genome sequences, which could be particularly useful in situations where there is a lack of available mtDNA data. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  1. Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

    PubMed

    Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

    2015-11-18

    RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.

  2. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    PubMed

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  3. De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

    PubMed Central

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603

  4. The immune gene repertoire of an important viral reservoir, the Australian black flying fox.

    PubMed

    Papenfuss, Anthony T; Baker, Michelle L; Feng, Zhi-Ping; Tachedjian, Mary; Crameri, Gary; Cowled, Chris; Ng, Justin; Janardhana, Vijaya; Field, Hume E; Wang, Lin-Fa

    2012-06-20

    Bats are the natural reservoir host for a range of emerging and re-emerging viruses, including SARS-like coronaviruses, Ebola viruses, henipaviruses and Rabies viruses. However, the mechanisms responsible for the control of viral replication in bats are not understood and there is little information available on any aspect of antiviral immunity in bats. Massively parallel sequencing of the bat transcriptome provides the opportunity for rapid gene discovery. Although the genomes of one megabat and one microbat have now been sequenced to low coverage, no transcriptomic datasets have been reported from any bat species. In this study, we describe the immune transcriptome of the Australian flying fox, Pteropus alecto, providing an important resource for identification of genes involved in a range of activities including antiviral immunity. Towards understanding the adaptations that have allowed bats to coexist with viruses, we have de novo assembled transcriptome sequence from immune tissues and stimulated cells from P. alecto. We identified about 18,600 genes involved in a broad range of activities with the most highly expressed genes involved in cell growth and maintenance, enzyme activity, cellular components and metabolism and energy pathways. 3.5% of the bat transcribed genes corresponded to immune genes and a total of about 500 immune genes were identified, providing an overview of both innate and adaptive immunity. A small proportion of transcripts found no match with annotated sequences in any of the public databases and may represent bat-specific transcripts. This study represents the first reported bat transcriptome dataset and provides a survey of expressed bat genes that complement existing bat genomic data. In addition, these data provide insight into genes relevant to the antiviral responses of bats, and form a basis for examining the roles of these molecules in immune response to viral infection.

  5. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments.

    PubMed

    Petukhov, Viktor; Guo, Jimin; Baryawno, Ninib; Severe, Nicolas; Scadden, David T; Samsonova, Maria G; Kharchenko, Peter V

    2018-06-19

    Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available. Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells.

  6. A Snapshot of a Coral “Holobiont”: A Transcriptome Assembly of the Scleractinian Coral, Porites, Captures a Wide Variety of Genes from Both the Host and Symbiotic Zooxanthellae

    PubMed Central

    Shinzato, Chuya; Inoue, Mayuri; Kusakabe, Makoto

    2014-01-01

    Massive scleractinian corals of the genus Porites are important reef builders in the Indo-Pacific, and they are more resistant to thermal stress than other stony corals, such as the genus Acropora. Because coral health and survival largely depend on the interaction between a coral host and its symbionts, it is important to understand the molecular interactions of an entire “coral holobiont”. We simultaneously sequenced transcriptomes of Porites australiensis and its symbionts using the Illumina Hiseq2000 platform. We obtained 14.3 Gbp of sequencing data and assembled it into 74,997 contigs (average: 1,263 bp, N50 size: 2,037 bp). We successfully distinguished contigs originating from the host (Porites) and the symbiont (Symbiodinium) by aligning nucleotide sequences with the decoded Acropora digitifera and Symbiodinium minutum genomes. In contrast to previous coral transcriptome studies, at least 35% of the sequences were found to have originated from the symbionts, indicating that it is possible to analyze both host and symbiont transcriptomes simultaneously. Conserved protein domain and KEGG analyses showed that the dataset contains broad gene repertoires of both Porites and Symbiodinium. Effective utilization of sequence reads revealed that the polymorphism rate in P. australiensis is 1.0% and identified the major symbiotic Symbiodinium as Type C15. Analyses of amino acid biosynthetic pathways suggested that this Porites holobiont is probably able to synthesize most of the common amino acids and that Symbiodinium is potentially able to provide essential amino acids to its host. We believe this to be the first molecular evidence of complementarity in amino acid metabolism between coral hosts and their symbionts. We successfully assembled genes originating from both the host coral and the symbiotic Symbiodinium to create a snapshot of the coral holobiont transcriptome. This dataset will facilitate a deeper understanding of molecular mechanisms of coral symbioses and stress responses. PMID:24454815

  7. A snapshot of a coral "holobiont": a transcriptome assembly of the scleractinian coral, porites, captures a wide variety of genes from both the host and symbiotic zooxanthellae.

    PubMed

    Shinzato, Chuya; Inoue, Mayuri; Kusakabe, Makoto

    2014-01-01

    Massive scleractinian corals of the genus Porites are important reef builders in the Indo-Pacific, and they are more resistant to thermal stress than other stony corals, such as the genus Acropora. Because coral health and survival largely depend on the interaction between a coral host and its symbionts, it is important to understand the molecular interactions of an entire "coral holobiont". We simultaneously sequenced transcriptomes of Porites australiensis and its symbionts using the Illumina Hiseq2000 platform. We obtained 14.3 Gbp of sequencing data and assembled it into 74,997 contigs (average: 1,263 bp, N50 size: 2,037 bp). We successfully distinguished contigs originating from the host (Porites) and the symbiont (Symbiodinium) by aligning nucleotide sequences with the decoded Acropora digitifera and Symbiodinium minutum genomes. In contrast to previous coral transcriptome studies, at least 35% of the sequences were found to have originated from the symbionts, indicating that it is possible to analyze both host and symbiont transcriptomes simultaneously. Conserved protein domain and KEGG analyses showed that the dataset contains broad gene repertoires of both Porites and Symbiodinium. Effective utilization of sequence reads revealed that the polymorphism rate in P. australiensis is 1.0% and identified the major symbiotic Symbiodinium as Type C15. Analyses of amino acid biosynthetic pathways suggested that this Porites holobiont is probably able to synthesize most of the common amino acids and that Symbiodinium is potentially able to provide essential amino acids to its host. We believe this to be the first molecular evidence of complementarity in amino acid metabolism between coral hosts and their symbionts. We successfully assembled genes originating from both the host coral and the symbiotic Symbiodinium to create a snapshot of the coral holobiont transcriptome. This dataset will facilitate a deeper understanding of molecular mechanisms of coral symbioses and stress responses.

  8. Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components.

    PubMed

    Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

    2016-12-09

    Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis . The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms.

  9. Venom Gland Transcriptomic and Proteomic Analyses of the Enigmatic Scorpion Superstitionia donensis (Scorpiones: Superstitioniidae), with Insights on the Evolution of Its Venom Components

    PubMed Central

    Santibáñez-López, Carlos E.; Cid-Uribe, Jimena I.; Batista, Cesar V. F.; Ortiz, Ernesto; Possani, Lourival D.

    2016-01-01

    Venom gland transcriptomic and proteomic analyses have improved our knowledge on the diversity of the heterogeneous components present in scorpion venoms. However, most of these studies have focused on species from the family Buthidae. To gain insights into the molecular diversity of the venom components of scorpions belonging to the family Superstitioniidae, one of the neglected scorpion families, we performed a transcriptomic and proteomic analyses for the species Superstitionia donensis. The total mRNA extracted from the venom glands of two specimens was subjected to massive sequencing by the Illumina protocol, and a total of 219,073 transcripts were generated. We annotated 135 transcripts putatively coding for peptides with identity to known venom components available from different protein databases. Fresh venom collected by electrostimulation was analyzed by LC-MS/MS allowing the identification of 26 distinct components with sequences matching counterparts from the transcriptomic analysis. In addition, the phylogenetic affinities of the found putative calcins, scorpines, La1-like peptides and potassium channel κ toxins were analyzed. The first three components are often reported as ubiquitous in the venom of different families of scorpions. Our results suggest that, at least calcins and scorpines, could be used as molecular markers in phylogenetic studies of scorpion venoms. PMID:27941686

  10. A Portrait of the Transcriptome of the Neglected Trematode, Fasciola gigantica—Biological and Biotechnological Implications

    PubMed Central

    Young, Neil D.; Jex, Aaron R.; Cantacessi, Cinzia; Hall, Ross S.; Campbell, Bronwyn E.; Spithill, Terence W.; Tangkawattana, Sirikachorn; Tangkawattana, Prasarn; Laha, Thewarach; Gasser, Robin B.

    2011-01-01

    Fasciola gigantica (Digenea) is an important foodborne trematode that causes liver fluke disease (fascioliasis) in mammals, including ungulates and humans, mainly in tropical climatic zones of the world. Despite its socioeconomic impact, almost nothing is known about the molecular biology of this parasite, its interplay with its hosts, and the pathogenesis of fascioliasis. Modern genomic technologies now provide unique opportunities to rapidly tackle these exciting areas. The present study reports the first transcriptome representing the adult stage of F. gigantica (of bovid origin), defined using a massively parallel sequencing-coupled bioinformatic approach. From >20 million raw sequence reads, >30,000 contiguous sequences were assembled, of which most were novel. Relative levels of transcription were determined for individual molecules, which were also characterized (at the inferred amino acid level) based on homology, gene ontology, and/or pathway mapping. Comparisons of the transcriptome of F. gigantica with those of other trematodes, including F. hepatica, revealed similarities in transcription for molecules inferred to have key roles in parasite-host interactions. Overall, the present dataset should provide a solid foundation for future fundamental genomic, proteomic, and metabolomic explorations of F. gigantica, as well as a basis for applied outcomes such as the development of novel methods of intervention against this neglected parasite. PMID:21408104

  11. Transcriptome sequencing of the Antarctic vascular plant Deschampsia antarctica Desv. under abiotic stress.

    PubMed

    Lee, Jungeun; Noh, Eun Kyeung; Choi, Hyung-Seok; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2013-03-01

    Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been studied as an extremophile that has successfully adapted to marginal land with the harshest environment for terrestrial plants. However, limited genetic research has focused on this species due to the lack of genomic resources. Here, we present the first de novo assembly of its transcriptome by massive parallel sequencing and its expression profile using D. antarctica grown under various stress conditions. Total sequence reads generated by pyrosequencing were assembled into 60,765 unigenes (28,177 contigs and 32,588 singletons). A total of 29,173 unique protein-coding genes were identified based on sequence similarities to known proteins. The combined results from all three stress conditions indicated differential expression of 3,110 genes. Quantitative reverse transcription polymerase chain reaction showed that several well-known stress-responsive genes encoding late embryogenesis abundant protein, dehydrin 1, and ice recrystallization inhibition protein were induced dramatically and that genes encoding U-box-domain-containing protein, electron transfer flavoprotein-ubiquinone, and F-box-containing protein were induced by abiotic stressors in a manner conserved with other plant species. We identified more than 2,000 simple sequence repeats that can be developed as functional molecular markers. This dataset is the most comprehensive transcriptome resource currently available for D. antarctica and is therefore expected to be an important foundation for future genetic studies of grasses and extremophiles.

  12. Massively parallel digital transcriptional profiling of single cells

    PubMed Central

    Zheng, Grace X. Y.; Terry, Jessica M.; Belgrader, Phillip; Ryvkin, Paul; Bent, Zachary W.; Wilson, Ryan; Ziraldo, Solongo B.; Wheeler, Tobias D.; McDermott, Geoff P.; Zhu, Junjie; Gregory, Mark T.; Shuga, Joe; Montesclaros, Luz; Underwood, Jason G.; Masquelier, Donald A.; Nishimura, Stefanie Y.; Schnall-Levin, Michael; Wyatt, Paul W.; Hindson, Christopher M.; Bharadwaj, Rajiv; Wong, Alexander; Ness, Kevin D.; Beppu, Lan W.; Deeg, H. Joachim; McFarland, Christopher; Loeb, Keith R.; Valente, William J.; Ericson, Nolan G.; Stevens, Emily A.; Radich, Jerald P.; Mikkelsen, Tarjei S.; Hindson, Benjamin J.; Bielas, Jason H.

    2017-01-01

    Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We describe a droplet-based system that enables 3′ mRNA counting of tens of thousands of single cells per sample. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency. To demonstrate the system's technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system's ability to characterize large immune populations. Finally, we used sequence variation in the transcriptome data to determine host and donor chimerism at single-cell resolution from bone marrow mononuclear cells isolated from transplant patients. PMID:28091601

  13. Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis

    PubMed Central

    Hahn, Daniel A; Ragland, Gregory J; Shoemaker, D DeWayne; Denlinger, David L

    2009-01-01

    Background Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptomic data are available for members of this genus. We used massively parallel pyrosequencing on the Roche 454-FLX platform to produce a substantial EST dataset for the flesh fly Sarcophaga crassipalpis. To maximize sequence diversity, we pooled RNA extracted from whole bodies of all life stages and normalized the cDNA pool after reverse transcription. Results We obtained 207,110 ESTs with an average read length of 241 bp. These reads assembled into 20,995 contigs and 31,056 singletons. Using BLAST searches of the NR and NT databases we were able to identify 11,757 unique gene elements (E<0.0001) representing approximately 9,000 independent transcripts. Comparison of the distribution of S. crassipalpis unigenes among GO Biological Process functional groups with that of the Drosophila melanogaster transcriptome suggests that our ESTs are broadly representative of the flesh fly transcriptome. Insertion and deletion errors in 454 sequencing present a serious hurdle to comparative transcriptome analysis. Aided by a new approach to correcting for these errors, we performed a comparative analysis of genetic divergence across GO categories among S. crassipalpis, D. melanogaster, and Anopheles gambiae. The results suggest that non-synonymous substitutions occur at similar rates across categories, although genes related to response to stimuli may evolve slightly faster. In addition, we identified over 500 potential microsatellite loci and more than 12,000 SNPs among our ESTs. Conclusion Our data provides the first large-scale EST-project for flesh flies, a much-needed resource for exploring this model species. In addition, we identified a large number of potential microsatellite and SNP markers that could be used in population and systematic studies of S. crassipalpis and other flesh flies. PMID:19454017

  14. Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud

    PubMed Central

    Griffith, Malachi; Walker, Jason R.; Spies, Nicholas C.; Ainscough, Benjamin J.; Griffith, Obi L.

    2015-01-01

    Massively parallel RNA sequencing (RNA-seq) has rapidly become the assay of choice for interrogating RNA transcript abundance and diversity. This article provides a detailed introduction to fundamental RNA-seq molecular biology and informatics concepts. We make available open-access RNA-seq tutorials that cover cloud computing, tool installation, relevant file formats, reference genomes, transcriptome annotations, quality-control strategies, expression, differential expression, and alternative splicing analysis methods. These tutorials and additional training resources are accompanied by complete analysis pipelines and test datasets made available without encumbrance at www.rnaseq.wiki. PMID:26248053

  15. Transcriptome Exploration in Leymus chinensis under Saline-Alkaline Treatment Using 454 Pyrosequencing

    PubMed Central

    Sun, Yepeng; Wang, Fawei; Wang, Nan; Dong, Yuanyuan; Liu, Qi; Zhao, Lei; Chen, Huan; Liu, Weican; Yin, Hailong; Zhang, Xiaomei; Yuan, Yanxi; Li, Haiyan

    2013-01-01

    Background Leymus chinensis (Trin.) Tzvel. is a high saline-alkaline tolerant forage grass genus of the tribe Gramineae family, which also plays an important role in protection of natural environment. To date, little is known about the saline-alkaline tolerance of L. chinensis on the molecular level. To better understand the molecular mechanism of saline-alkaline tolerance in L. chinensis, 454 pyrosequencing was used for the transcriptome study. Results We used Roche-454 massive parallel pyrosequencing technology to sequence two different cDNA libraries that were built from the two samples of control and under saline-alkaline treatment (optimal stress concentration-Hoagland solution with 100 mM NaCl and 200 mM NaHCO3). A total of 363,734 reads in control group and 526,267 reads in treatment group with an average length of 489 bp and 493 bp were obtained, respectively. The reads were assembled into 104,105 unigenes with MIRA sequence assemable software, among which, 73,665 unigenes were in control group, 88,016 unigenes in treatment group and 57,576 unigenes in both groups. According to the comparative expression analysis between the two groups with the threshold of “log2 Ratio ≥1”, there were 36,497 up-regulated unegenes and 18,218 down-regulated unigenes predicted to be the differentially expressed genes. After gene annotation and pathway enrichment analysis, most of them were involved in stress and tolerant function, signal transduction, energy production and conversion, and inorganic ion transport. Furthermore, 16 of these differentially expressed genes were selected for real-time PCR validation, and they were successfully confirmed with the results of 454 pyrosequencing. Conclusions This work is the first time to study the transcriptome of L. chinensis under saline-alkaline treatment based on the 454-FLX massively parallel DNA sequencing platform. It also deepened studies on molecular mechanisms of saline-alkaline in L. chinensis, and constituted a database for future studies. PMID:23365637

  16. A-to-I RNA editing occurs at over a hundred million genomic sites, located in a majority of human genes.

    PubMed

    Bazak, Lily; Haviv, Ami; Barak, Michal; Jacob-Hirsch, Jasmine; Deng, Patricia; Zhang, Rui; Isaacs, Farren J; Rechavi, Gideon; Li, Jin Billy; Eisenberg, Eli; Levanon, Erez Y

    2014-03-01

    RNA molecules transmit the information encoded in the genome and generally reflect its content. Adenosine-to-inosine (A-to-I) RNA editing by ADAR proteins converts a genomically encoded adenosine into inosine. It is known that most RNA editing in human takes place in the primate-specific Alu sequences, but the extent of this phenomenon and its effect on transcriptome diversity are not yet clear. Here, we analyzed large-scale RNA-seq data and detected ∼1.6 million editing sites. As detection sensitivity increases with sequencing coverage, we performed ultradeep sequencing of selected Alu sequences and showed that the scope of editing is much larger than anticipated. We found that virtually all adenosines within Alu repeats that form double-stranded RNA undergo A-to-I editing, although most sites exhibit editing at only low levels (<1%). Moreover, using high coverage sequencing, we observed editing of transcripts resulting from residual antisense expression, doubling the number of edited sites in the human genome. Based on bioinformatic analyses and deep targeted sequencing, we estimate that there are over 100 million human Alu RNA editing sites, located in the majority of human genes. These findings set the stage for exploring how this primate-specific massive diversification of the transcriptome is utilized.

  17. The immune gene repertoire of an important viral reservoir, the Australian black flying fox

    PubMed Central

    2012-01-01

    Background Bats are the natural reservoir host for a range of emerging and re-emerging viruses, including SARS-like coronaviruses, Ebola viruses, henipaviruses and Rabies viruses. However, the mechanisms responsible for the control of viral replication in bats are not understood and there is little information available on any aspect of antiviral immunity in bats. Massively parallel sequencing of the bat transcriptome provides the opportunity for rapid gene discovery. Although the genomes of one megabat and one microbat have now been sequenced to low coverage, no transcriptomic datasets have been reported from any bat species. In this study, we describe the immune transcriptome of the Australian flying fox, Pteropus alecto, providing an important resource for identification of genes involved in a range of activities including antiviral immunity. Results Towards understanding the adaptations that have allowed bats to coexist with viruses, we have de novo assembled transcriptome sequence from immune tissues and stimulated cells from P. alecto. We identified about 18,600 genes involved in a broad range of activities with the most highly expressed genes involved in cell growth and maintenance, enzyme activity, cellular components and metabolism and energy pathways. 3.5% of the bat transcribed genes corresponded to immune genes and a total of about 500 immune genes were identified, providing an overview of both innate and adaptive immunity. A small proportion of transcripts found no match with annotated sequences in any of the public databases and may represent bat-specific transcripts. Conclusions This study represents the first reported bat transcriptome dataset and provides a survey of expressed bat genes that complement existing bat genomic data. In addition, these data provide insight into genes relevant to the antiviral responses of bats, and form a basis for examining the roles of these molecules in immune response to viral infection. PMID:22716473

  18. Comparative description of ten transcriptomes of newly sequenced invertebrates and efficiency estimation of genomic sampling in non-model taxa

    PubMed Central

    2012-01-01

    Introduction Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa. Results cDNA libraries of ten species belonging to five animal phyla (2 Annelida [including Sipuncula], 2 Arthropoda, 2 Mollusca, 2 Nemertea, and 2 Porifera) were sequenced in different batches with an Illumina Genome Analyzer II (read length 100 or 150 bp), rendering between ca. 25 and 52 million reads per species. Read thinning, trimming, and de novo assembly were performed under different parameters to optimize output. Between 67,423 and 207,559 contigs were obtained across the ten species, post-optimization. Of those, 9,069 to 25,681 contigs retrieved blast hits against the NCBI non-redundant database, and approximately 50% of these were assigned with Gene Ontology terms, covering all major categories, and with similar percentages in all species. Local blasts against our datasets, using selected genes from major signaling pathways and housekeeping genes, revealed high efficiency in gene recovery compared to available genomes of closely related species. Intriguingly, our transcriptomic datasets detected multiple paralogues in all phyla and in nearly all gene pathways, including housekeeping genes that are traditionally used in phylogenetic applications for their purported single-copy nature. Conclusions We generated the first study of comparative transcriptomics across multiple animal phyla (comparing two species per phylum in most cases), established the first Illumina-based transcriptomic datasets for sponge, nemertean, and sipunculan species, and generated a tractable catalogue of annotated genes (or gene fragments) and protein families for ten newly sequenced non-model organisms, some of commercial importance (i.e., Octopus vulgaris). These comprehensive sets of genes can be readily used for phylogenetic analysis, gene expression profiling, developmental analysis, and can also be a powerful resource for gene discovery. The characterization of the transcriptomes of such a diverse array of animal species permitted the comparison of sequencing depth, functional annotation, and efficiency of genomic sampling using the same pipelines, which proved to be similar for all considered species. In addition, the datasets revealed their potential as a resource for paralogue detection, a recurrent concern in various aspects of biological inquiry, including phylogenetics, molecular evolution, development, and cellular biochemistry. PMID:23190771

  19. Key metabolic pathways involved in xenobiotic biotransformation and stress responses revealed by transcriptomics of the mangrove oyster Crassostrea brasiliana.

    PubMed

    Lüchmann, Karim H; Clark, Melody S; Bainy, Afonso C D; Gilbert, Jack A; Craft, John A; Chipman, J Kevin; Thorne, Michael A S; Mattos, Jacó J; Siebert, Marília N; Schroeder, Declan C

    2015-09-01

    The Brazilian oyster Crassostrea brasiliana was challenged to three common environmental contaminants: phenanthrene, diesel fuel water-accommodated fraction (WAF) and domestic sewage. Total RNA was extracted from the gill and digestive gland, and cDNA libraries were sequenced using the 454 FLX platform. The assembled transcriptome resulted in ̃20,000 contigs, which were annotated to produce the first de novo transcriptome for C. brasiliana. Sequences were screened to identify genes potentially involved in the biotransformation of xenobiotics and associated antioxidant defence mechanisms. These gene families included those of the cytochrome P450 (CYP450), 70kDa heat shock, antioxidants, such as glutathione S-transferase, superoxide dismutase, catalase and also multi-drug resistance proteins. Analysis showed that the massive expansion of the CYP450 and HSP70 family due to gene duplication identified in the Crassostrea gigas genome also occurred in C. brasiliana, suggesting these processes form the base of the Crassostrea lineage. Preliminary expression analyses revealed several candidates biomarker genes that were up-regulated during each of the three treatments, suggesting the potential for environmental monitoring. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis.

    PubMed

    Smola, Matthew J; Rice, Greggory M; Busan, Steven; Siegfried, Nathan A; Weeks, Kevin M

    2015-11-01

    Selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) chemistries exploit small electrophilic reagents that react with 2'-hydroxyl groups to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues by using reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as can be done for simple model RNAs. This protocol describes the experimental steps, implemented over 3 d, that are required to perform SHAPE probing and to construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots and provides useful troubleshooting information. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures and visualize probable and alternative helices, often in under 1 d. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles and entire transcriptomes.

  1. A Novel Computational Strategy to Identify A-to-I RNA Editing Sites by RNA-Seq Data: De Novo Detection in Human Spinal Cord Tissue

    PubMed Central

    Picardi, Ernesto; Gallo, Angela; Galeano, Federica; Tomaselli, Sara; Pesole, Graziano

    2012-01-01

    RNA editing is a post-transcriptional process occurring in a wide range of organisms. In human brain, the A-to-I RNA editing, in which individual adenosine (A) bases in pre-mRNA are modified to yield inosine (I), is the most frequent event. Modulating gene expression, RNA editing is essential for cellular homeostasis. Indeed, its deregulation has been linked to several neurological and neurodegenerative diseases. To date, many RNA editing sites have been identified by next generation sequencing technologies employing massive transcriptome sequencing together with whole genome or exome sequencing. While genome and transcriptome reads are not always available for single individuals, RNA-Seq data are widespread through public databases and represent a relevant source of yet unexplored RNA editing sites. In this context, we propose a simple computational strategy to identify genomic positions enriched in novel hypothetical RNA editing events by means of a new two-steps mapping procedure requiring only RNA-Seq data and no a priori knowledge of RNA editing characteristics and genomic reads. We assessed the suitability of our procedure by confirming A-to-I candidates using conventional Sanger sequencing and performing RNA-Seq as well as whole exome sequencing of human spinal cord tissue from a single individual. PMID:22957051

  2. Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

    PubMed

    Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

    2012-10-01

    Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  3. A NGS approach to the encrusting Mediterranean sponge Crella elegans (Porifera, Demospongiae, Poecilosclerida): transcriptome sequencing, characterization and overview of the gene expression along three life cycle stages.

    PubMed

    Pérez-Porro, A R; Navarro-Gómez, D; Uriz, M J; Giribet, G

    2013-05-01

    Sponges can be dominant organisms in many marine and freshwater habitats where they play essential ecological roles. They also represent a key group to address important questions in early metazoan evolution. Recent approaches for improving knowledge on sponge biological and ecological functions as well as on animal evolution have focused on the genetic toolkits involved in ecological responses to environmental changes (biotic and abiotic), development and reproduction. These approaches are possible thanks to newly available, massive sequencing technologies-such as the Illumina platform, which facilitate genome and transcriptome sequencing in a cost-effective manner. Here we present the first NGS (next-generation sequencing) approach to understanding the life cycle of an encrusting marine sponge. For this we sequenced libraries of three different life cycle stages of the Mediterranean sponge Crella elegans and generated de novo transcriptome assemblies. Three assemblies were based on sponge tissue of a particular life cycle stage, including non-reproductive tissue, tissue with sperm cysts and tissue with larvae. The fourth assembly pooled the data from all three stages. By aggregating data from all the different life cycle stages we obtained a higher total number of contigs, contigs with blast hit and annotated contigs than from one stage-based assemblies. In that multi-stage assembly we obtained a larger number of the developmental regulatory genes known for metazoans than in any other assembly. We also advance the differential expression of selected genes in the three life cycle stages to explore the potential of RNA-seq for improving knowledge on functional processes along the sponge life cycle. © 2013 Blackwell Publishing Ltd.

  4. In vitro manipulation of gene expression in larval Schistosoma: a model for postgenomic approaches in Trematoda

    PubMed Central

    YOSHINO, TIMOTHY P.; DINGUIRARD, NATHALIE; DE MORAES MOURÃO, MARINA

    2013-01-01

    SUMMARY With rapid developments in DNA and protein sequencing technologies, combined with powerful bioinformatics tools, a continued acceleration of gene identification in parasitic helminths is predicted, potentially leading to discovery of new drug and vaccine targets, enhanced diagnostics and insights into the complex biology underlying host-parasite interactions. For the schistosome blood flukes, with the recent completion of genome sequencing and comprehensive transcriptomic datasets, there has accumulated massive amounts of gene sequence data, for which, in the vast majority of cases, little is known about actual functions within the intact organism. In this review we attempt to bring together traditional in vitro cultivation approaches and recent emergent technologies of molecular genomics, transcriptomics and genetic manipulation to illustrate the considerable progress made in our understanding of trematode gene expression and function during development of the intramolluscan larval stages. Using several prominent trematode families (Schistosomatidae, Fasciolidae, Echinostomatidae), we have focused on the current status of in vitro larval isolation/cultivation as a source of valuable raw material supporting gene discovery efforts in model digeneans that include whole genome sequencing, transcript and protein expression profiling during larval development, and progress made in the in vitro manipulation of genes and their expression in larval trematodes using transgenic and RNA interference (RNAi) approaches. PMID:19961646

  5. Molecular effects of doxycycline treatment on pterygium as revealed by massive transcriptome sequencing.

    PubMed

    Larráyoz, Ignacio M; de Luis, Alberto; Rúa, Oscar; Velilla, Sara; Cabello, Juan; Martínez, Alfredo

    2012-01-01

    Pterygium is a lesion of the eye surface which involves cell proliferation, migration, angiogenesis, fibrosis, and extracellular matrix remodelling. Surgery is the only approved method to treat this disorder, but high recurrence rates are common. Recently, it has been shown in a mouse model that treatment with doxycycline resulted in reduction of the pterygium lesions. Here we study the mechanism(s) of action by which doxycycline achieves these results, using massive sequencing techniques. Surgically removed pterygia from 10 consecutive patients were set in short term culture and exposed to 0 (control), 50, 200, and 500 µg/ml doxycycline for 24 h, their mRNA was purified, reverse transcribed and sequenced through Illumina's massive sequencing protocols. Acquired data were subjected to quantile normalization and analyzed using cytoscape plugin software to explore the pathways involved. False discovery rate (FDR) methods were used to identify 332 genes which modified their expression in a dose-dependent manner upon exposure to doxycycline. The more represented cellular pathways included all mitochondrial genes, the endoplasmic reticulum stress response, integrins and extracellular matrix components, and growth factors. A high correlation was obtained when comparing ultrasequencing data with qRT-PCR and ELISA results. Doxycycline significantly modified the expression of important cellular pathways in pterygium cells, in a way which is consistent with the observed efficacy of this antibiotic to reduce pterygium lesions in a mouse model. Clinical trials are under way to demonstrate whether there is a benefit for human patients.

  6. Agave: a biofuel feedstock for arid and semi-arid environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gross, Stephen; Martin, Jeffrey; Simpson, June

    2011-05-31

    Efficient production of plant-based, lignocellulosic biofuels relies upon continued improvement of existing biofuel feedstock species, as well as the introduction of newfeedstocks capable of growing on marginal lands to avoid conflicts with existing food production and minimize use of water and nitrogen resources. To this end, specieswithin the plant genus Agave have recently been proposed as new biofuel feedstocks. Many Agave species are adapted to hot and arid environments generally unsuitable forfood production, yet have biomass productivity rates comparable to other second-generation biofuel feedstocks such as switchgrass and Miscanthus. Agavesachieve remarkable heat tolerance and water use efficiency in part throughmore » a Crassulacean Acid Metabolism (CAM) mode of photosynthesis, but the genes andregulatory pathways enabling CAM and thermotolerance in agaves remain poorly understood. We seek to accelerate the development of agave as a new biofuelfeedstock through genomic approaches using massively-parallel sequencing technologies. First, we plan to sequence the transcriptome of A. tequilana to provide adatabase of protein-coding genes to the agave research community. Second, we will compare transcriptome-wide gene expression of agaves under different environmentalconditions in order to understand genetic pathways controlling CAM, water use efficiency, and thermotolerance. Finally, we aim to compare the transcriptome of A.tequilana with that of other Agave species to gain further insight into molecular mechanisms underlying traits desirable for biofuel feedstocks. These genomicapproaches will provide sequence and gene expression information critical to the breeding and domestication of Agave species suitable for biofuel production.« less

  7. Global characterization of Artemisia annua glandular trichome transcriptome using 454 pyrosequencing

    PubMed Central

    Wang, Wei; Wang, Yejun; Zhang, Qing; Qi, Yan; Guo, Dianjing

    2009-01-01

    Background Glandular trichomes produce a wide variety of commercially important secondary metabolites in many plant species. The most prominent anti-malarial drug artemisinin, a sesquiterpene lactone, is produced in glandular trichomes of Artemisia annua. However, only limited genomic information is currently available in this non-model plant species. Results We present a global characterization of A. annua glandular trichome transcriptome using 454 pyrosequencing. Sequencing runs using two normalized cDNA collections from glandular trichomes yielded 406,044 expressed sequence tags (average length = 210 nucleotides), which assembled into 42,678 contigs and 147,699 singletons. Performing a second sequencing run only increased the number of genes identified by ~30%, indicating that massively parallel pyrosequencing provides deep coverage of the A. annua trichome transcriptome. By BLAST search against the NCBI non-redundant protein database, putative functions were assigned to over 28,573 unigenes, including previously undescribed enzymes likely involved in sesquiterpene biosynthesis. Comparison with ESTs derived from trichome collections of other plant species revealed expressed genes in common functional categories across different plant species. RT-PCR analysis confirmed the expression of selected unigenes and novel transcripts in A. annua glandular trichomes. Conclusion The presence of contigs corresponding to enzymes for terpenoids and flavonoids biosynthesis suggests important metabolic activity in A. annua glandular trichomes. Our comprehensive survey of genes expressed in glandular trichome will facilitate new gene discovery and shed light on the regulatory mechanism of artemisinin metabolism and trichome function in A. annua. PMID:19818120

  8. Signatures from Tissue-specific MPSS Libraries Identify Transcripts Preferentially Expressed in the Mouse Inner Ear

    PubMed Central

    Peters, Linda M.; Belyantseva, Inna A.; Lagziel, Ayala; Battey, James F.; Friedman, Thomas B.; Morell, Robert J.

    2007-01-01

    Specialization in cell function and morphology is influenced by the differential expression of mRNAs, many of which are expressed at low abundance and restricted to certain cell types. Detecting such transcripts in cDNA libraries may require sequencing millions of clones. Massively parallel signature sequencing (MPSS) is well-suited for identifying transcripts that are expressed in discrete cell types and in low abundance. We have made MPSS libraries from microdissections of three inner ear tissues. By comparing these MPSS libraries to those of 87 other tissues included in the Mouse Reference Transcriptome (MRT) online resource, we have identified genes that are highly enriched in, or specific to, the inner ear. We show by RT-PCR and in situ hybridization that signatures unique to the inner ear libraries identify transcripts with highly specific cell-type localizations. These transcripts serve to illustrate the utility of a resource that is available to the research community. Utilization of these resources will increase the number of known transcription units and expand our knowledge of the tissue-specific regulation of the transcriptome. PMID:17049805

  9. Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing.

    PubMed

    Tang, Qin; Iyer, Sowmya; Lobbardi, Riadh; Moore, John C; Chen, Huidong; Lareau, Caleb; Hebert, Christine; Shaw, McKenzie L; Neftel, Cyril; Suva, Mario L; Ceol, Craig J; Bernards, Andre; Aryee, Martin; Pinello, Luca; Drummond, Iain A; Langenau, David M

    2017-10-02

    Recent advances in single-cell, transcriptomic profiling have provided unprecedented access to investigate cell heterogeneity during tissue and organ development. In this study, we used massively parallel, single-cell RNA sequencing to define cell heterogeneity within the zebrafish kidney marrow, constructing a comprehensive molecular atlas of definitive hematopoiesis and functionally distinct renal cells found in adult zebrafish. Because our method analyzed blood and kidney cells in an unbiased manner, our approach was useful in characterizing immune-cell deficiencies within DNA-protein kinase catalytic subunit ( prkdc ), interleukin-2 receptor γ a ( il2rga ), and double-homozygous-mutant fish, identifying blood cell losses in T, B, and natural killer cells within specific genetic mutants. Our analysis also uncovered novel cell types, including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells. In total, our work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish. © 2017 Tang et al.

  10. Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing

    PubMed Central

    Iyer, Sowmya; Lobbardi, Riadh; Chen, Huidong; Hebert, Christine; Shaw, McKenzie L.; Neftel, Cyril; Suva, Mario L.; Bernards, Andre; Aryee, Martin; Drummond, Iain A.

    2017-01-01

    Recent advances in single-cell, transcriptomic profiling have provided unprecedented access to investigate cell heterogeneity during tissue and organ development. In this study, we used massively parallel, single-cell RNA sequencing to define cell heterogeneity within the zebrafish kidney marrow, constructing a comprehensive molecular atlas of definitive hematopoiesis and functionally distinct renal cells found in adult zebrafish. Because our method analyzed blood and kidney cells in an unbiased manner, our approach was useful in characterizing immune-cell deficiencies within DNA–protein kinase catalytic subunit (prkdc), interleukin-2 receptor γ a (il2rga), and double-homozygous–mutant fish, identifying blood cell losses in T, B, and natural killer cells within specific genetic mutants. Our analysis also uncovered novel cell types, including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells. In total, our work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish. PMID:28878000

  11. Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing.

    PubMed

    Richter, Julia; Schlesner, Matthias; Hoffmann, Steve; Kreuz, Markus; Leich, Ellen; Burkhardt, Birgit; Rosolowski, Maciej; Ammerpohl, Ole; Wagener, Rabea; Bernhart, Stephan H; Lenze, Dido; Szczepanowski, Monika; Paulsen, Maren; Lipinski, Simone; Russell, Robert B; Adam-Klages, Sabine; Apic, Gordana; Claviez, Alexander; Hasenclever, Dirk; Hovestadt, Volker; Hornig, Nadine; Korbel, Jan O; Kube, Dieter; Langenberger, David; Lawerenz, Chris; Lisfeld, Jasmin; Meyer, Katharina; Picelli, Simone; Pischimarov, Jordan; Radlwimmer, Bernhard; Rausch, Tobias; Rohde, Marius; Schilhabel, Markus; Scholtysik, René; Spang, Rainer; Trautmann, Heiko; Zenz, Thorsten; Borkhardt, Arndt; Drexler, Hans G; Möller, Peter; MacLeod, Roderick A F; Pott, Christiane; Schreiber, Stefan; Trümper, Lorenz; Loeffler, Markus; Stadler, Peter F; Lichter, Peter; Eils, Roland; Küppers, Ralf; Hummel, Michael; Klapper, Wolfram; Rosenstiel, Philip; Rosenwald, Andreas; Brors, Benedikt; Siebert, Reiner

    2012-12-01

    Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis.

  12. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  13. RISC RNA sequencing for context-specific identification of in vivo microRNA targets.

    PubMed

    Matkovich, Scot J; Van Booven, Derek J; Eschenbacher, William H; Dorn, Gerald W

    2011-01-07

    MicroRNAs (miRs) are expanding our understanding of cardiac disease and have the potential to transform cardiovascular therapeutics. One miR can target hundreds of individual mRNAs, but existing methodologies are not sufficient to accurately and comprehensively identify these mRNA targets in vivo. To develop methods permitting identification of in vivo miR targets in an unbiased manner, using massively parallel sequencing of mouse cardiac transcriptomes in combination with sequencing of mRNA associated with mouse cardiac RNA-induced silencing complexes (RISCs). We optimized techniques for expression profiling small amounts of RNA without introducing amplification bias and applied this to anti-Argonaute 2 immunoprecipitated RISCs (RISC-Seq) from mouse hearts. By comparing RNA-sequencing results of cardiac RISC and transcriptome from the same individual hearts, we defined 1645 mRNAs consistently targeted to mouse cardiac RISCs. We used this approach in hearts overexpressing miRs from Myh6 promoter-driven precursors (programmed RISC-Seq) to identify 209 in vivo targets of miR-133a and 81 in vivo targets of miR-499. Consistent with the fact that miR-133a and miR-499 have widely differing "seed" sequences and belong to different miR families, only 6 targets were common to miR-133a- and miR-499-programmed hearts. RISC-sequencing is a highly sensitive method for general RISC profiling and individual miR target identification in biological context and is applicable to any tissue and any disease state.

  14. A new approach for detecting adventitious viruses shows Sf-rhabdovirus-negative Sf-RVN cells are suitable for safe biologicals production.

    PubMed

    Geisler, Christoph

    2018-02-07

    Adventitious viral contamination in cell substrates used for biologicals production is a major safety concern. A powerful new approach that can be used to identify adventitious viruses is a combination of bioinformatics tools with massively parallel sequencing technology. Typically, this involves mapping or BLASTN searching individual reads against viral nucleotide databases. Although extremely sensitive for known viruses, this approach can easily miss viruses that are too dissimilar to viruses in the database. Moreover, it is computationally intensive and requires reference cell genome databases. To avoid these drawbacks, we set out to develop an alternative approach. We reasoned that searching genome and transcriptome assemblies for adventitious viral contaminants using TBLASTN with a compact viral protein database covering extant viral diversity as the query could be fast and sensitive without a requirement for high performance computing hardware. We tested our approach on Spodoptera frugiperda Sf-RVN, a recently isolated insect cell line, to determine if it was contaminated with one or more adventitious viruses. We used Illumina reads to assemble the Sf-RVN genome and transcriptome and searched them for adventitious viral contaminants using TBLASTN with our viral protein database. We found no evidence of viral contamination, which was substantiated by the fact that our searches otherwise identified diverse sequences encoding virus-like proteins. These sequences included Maverick, R1 LINE, and errantivirus transposons, all of which are common in insect genomes. We also identified previously described as well as novel endogenous viral elements similar to ORFs encoded by diverse insect viruses. Our results demonstrate TBLASTN searching massively parallel sequencing (MPS) assemblies with a compact, manually curated viral protein database is more sensitive for adventitious virus detection than BLASTN, as we identified various sequences that encoded virus-like proteins, but had no similarity to viral sequences at the nucleotide level. Moreover, searches were fast without requiring high performance computing hardware. Our study also documents the enhanced biosafety profile of Sf-RVN as compared to other Sf cell lines, and supports the notion that Sf-RVN is highly suitable for the production of safe biologicals.

  15. Venom gland transcriptomic and venom proteomic analyses of the scorpion Megacormus gertschi Díaz-Najera, 1966 (Scorpiones: Euscorpiidae: Megacorminae).

    PubMed

    Santibáñez-López, Carlos E; Cid-Uribe, Jimena I; Zamudio, Fernando Z; Batista, Cesar V F; Ortiz, Ernesto; Possani, Lourival D

    2017-07-01

    The soluble venom from the Mexican scorpion Megacormus gertschi of the family Euscorpiidae was obtained and its biological effects were tested in several animal models. This venom is not toxic to mice at doses of 100 μg per 20 g of mouse weight, while being lethal to arthropods (insects and crustaceans), at doses of 20 μg (for crickets) and 100 μg (for shrimps) per animal. Samples of the venom were separated by high performance liquid chromatography and circa 80 distinct chromatographic fractions were obtained from which 67 components have had their molecular weights determined by mass spectrometry analysis. The N-terminal amino acid sequence of seven protein/peptides were obtained by Edman degradation and are reported. Among the high molecular weight components there are enzymes with experimentally-confirmed phospholipase activity. A pair of telsons from this scorpion species was dissected, from which total RNA was extracted and used for cDNA library construction. Massive sequencing by the Illumina protocol, followed by de novo assembly, resulted in a total of 110,528 transcripts. From those, we were able to annotate 182, which putatively code for peptides/proteins with sequence similarity to previously-reported venom components available from different protein databases. Transcripts seemingly coding for enzymes showed the richest diversity, with 52 sequences putatively coding for proteases, 20 for phospholipases, 8 for lipases and 5 for hyaluronidases. The number of different transcripts potentially coding for peptides with sequence similarity to those that affect ion channels was 19, for putative antimicrobial peptides 19, and for protease inhibitor-like peptides, 18. Transcripts seemingly coding for other venom components were identified and described. The LC/MS analysis of a trypsin-digested venom aliquot resulted in 23 matches with the translated transcriptome database, which validates the transcriptome. The proteomic and transcriptomic analyses reported here constitute the first approach to study the venom components from a scorpion species belonging to the family Euscorpiidae. The data certainly show that this venom is different from all the ones described thus far in the literature. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. RISC RNA sequencing for context-specific identification of in vivo miR targets

    PubMed Central

    Matkovich, Scot J; Van Booven, Derek J; Eschenbacher, William H; Dorn, Gerald W

    2010-01-01

    Rationale MicroRNAs (miRs) are expanding our understanding of cardiac disease and have the potential to transform cardiovascular therapeutics. One miR can target hundreds of individual mRNAs, but existing methodologies are not sufficient to accurately and comprehensively identify these mRNA targets in vivo. Objective To develop methods permitting identification of in vivo miR targets in an unbiased manner, using massively parallel sequencing of mouse cardiac transcriptomes in combination with sequencing of mRNA associated with mouse cardiac RNA-induced silencing complexes (RISCs). Methods and Results We optimized techniques for expression profiling small amounts of RNA without introducing amplification bias, and applied this to anti-Argonaute 2 immunoprecipitated RISCs (RISC-Seq) from mouse hearts. By comparing RNA-sequencing results of cardiac RISC and transcriptome from the same individual hearts, we defined 1,645 mRNAs consistently targeted to mouse cardiac RISCs. We employed this approach in hearts overexpressing miRs from Myh6 promoter-driven precursors (programmed RISC-Seq) to identify 209 in vivo targets of miR-133a and 81 in vivo targets of miR-499. Consistent with the fact that miR-133a and miR-499 have widely differing ‘seed’ sequences and belong to different miR families, only 6 targets were common to miR-133a- and miR-499-programmed hearts. Conclusions RISC-sequencing is a highly sensitive method for general RISC profiling and individual miR target identification in biological context, and is applicable to any tissue and any disease state. Summary MicroRNAs (miRs) are key regulators of mRNA translation in health and disease. While bioinformatic predictions suggest that a single miR may target hundreds of mRNAs, the number of experimentally verified targets of miRs is low. To enable comprehensive, unbiased examination of miR targets, we have performed deep RNA sequencing of cardiac transcriptomes in parallel with cardiac RNA-induced silencing complex (RISC)-associated RNAs (the RISCome), called RISC sequencing. We developed methods that did not require cross-linking of RNAs to RISCs or amplification of mRNA prior to sequencing, making it possible to rapidly perform RISC sequencing from intact tissue while avoiding amplification bias. Comparison of RISCome with transcriptome expression defined the degree of RISC enrichment for each mRNA. The majority of the mRNAs enriched in wild-type cardiac RISComes compared to transcriptomes were bioinformatically predicted to be targets of at least 1 of 139 cardiac-expressed miRs. Programming cardiomyocyte RISCs via transgenic overexpression in adult hearts of miR-133a or miR-499, two miRs that contain entirely different ‘seed’ sequences, elicited differing profiles of RISC-targeted mRNAs. Thus, RISC sequencing represents a highly sensitive method for general RISC profiling and individual miR target identification in biological context. PMID:21030712

  17. Massively parallel nanowell-based single-cell gene expression profiling.

    PubMed

    Goldstein, Leonard D; Chen, Ying-Jiun Jasmine; Dunne, Jude; Mir, Alain; Hubschle, Hermann; Guillory, Joseph; Yuan, Wenlin; Zhang, Jingli; Stinson, Jeremy; Jaiswal, Bijay; Pahuja, Kanika Bajaj; Mann, Ishminder; Schaal, Thomas; Chan, Leo; Anandakrishnan, Sangeetha; Lin, Chun-Wah; Espinoza, Patricio; Husain, Syed; Shapiro, Harris; Swaminathan, Karthikeyan; Wei, Sherry; Srinivasan, Maithreyan; Seshagiri, Somasekar; Modrusan, Zora

    2017-07-07

    Technological advances have enabled transcriptome characterization of cell types at the single-cell level providing new biological insights. New methods that enable simple yet high-throughput single-cell expression profiling are highly desirable. Here we report a novel nanowell-based single-cell RNA sequencing system, ICELL8, which enables processing of thousands of cells per sample. The system employs a 5,184-nanowell-containing microchip to capture ~1,300 single cells and process them. Each nanowell contains preprinted oligonucleotides encoding poly-d(T), a unique well barcode, and a unique molecular identifier. The ICELL8 system uses imaging software to identify nanowells containing viable single cells and only wells with single cells are processed into sequencing libraries. Here, we report the performance and utility of ICELL8 using samples of increasing complexity from cultured cells to mouse solid tissue samples. Our assessment of the system to discriminate between mixed human and mouse cells showed that ICELL8 has a low cell multiplet rate (< 3%) and low cross-cell contamination. We characterized single-cell transcriptomes of more than a thousand cultured human and mouse cells as well as 468 mouse pancreatic islets cells. We were able to identify distinct cell types in pancreatic islets, including alpha, beta, delta and gamma cells. Overall, ICELL8 provides efficient and cost-effective single-cell expression profiling of thousands of cells, allowing researchers to decipher single-cell transcriptomes within complex biological samples.

  18. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE

    PubMed Central

    2011-01-01

    Background The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. Results We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress. Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. Conclusions This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE. PMID:21320317

  19. The salt-responsive transcriptome of chickpea roots and nodules via deepSuperSAGE.

    PubMed

    Molina, Carlos; Zaman-Allah, Mainassara; Khan, Faheema; Fatnassi, Nadia; Horres, Ralf; Rotter, Björn; Steinhauer, Diana; Amenc, Laurie; Drevon, Jean-Jacques; Winter, Peter; Kahl, Günter

    2011-02-14

    The combination of high-throughput transcript profiling and next-generation sequencing technologies is a prerequisite for genome-wide comprehensive transcriptome analysis. Our recent innovation of deepSuperSAGE is based on an advanced SuperSAGE protocol and its combination with massively parallel pyrosequencing on Roche's 454 sequencing platform. As a demonstration of the power of this combination, we have chosen the salt stress transcriptomes of roots and nodules of the third most important legume crop chickpea (Cicer arietinum L.). While our report is more technology-oriented, it nevertheless addresses a major world-wide problem for crops generally: high salinity. Together with low temperatures and water stress, high salinity is responsible for crop losses of millions of tons of various legume (and other) crops. Continuously deteriorating environmental conditions will combine with salinity stress to further compromise crop yields. As a good example for such stress-exposed crop plants, we started to characterize salt stress responses of chickpeas on the transcriptome level. We used deepSuperSAGE to detect early global transcriptome changes in salt-stressed chickpea. The salt stress responses of 86,919 transcripts representing 17,918 unique 26 bp deepSuperSAGE tags (UniTags) from roots of the salt-tolerant variety INRAT-93 two hours after treatment with 25 mM NaCl were characterized. Additionally, the expression of 57,281 transcripts representing 13,115 UniTags was monitored in nodules of the same plants. From a total of 144,200 analyzed 26 bp tags in roots and nodules together, 21,401 unique transcripts were identified. Of these, only 363 and 106 specific transcripts, respectively, were commonly up- or down-regulated (>3.0-fold) under salt stress in both organs, witnessing a differential organ-specific response to stress.Profiting from recent pioneer works on massive cDNA sequencing in chickpea, more than 9,400 UniTags were able to be linked to UniProt entries. Additionally, gene ontology (GO) categories over-representation analysis enabled to filter out enriched biological processes among the differentially expressed UniTags. Subsequently, the gathered information was further cross-checked with stress-related pathways. From several filtered pathways, here we focus exemplarily on transcripts associated with the generation and scavenging of reactive oxygen species (ROS), as well as on transcripts involved in Na+ homeostasis. Although both processes are already very well characterized in other plants, the information generated in the present work is of high value. Information on expression profiles and sequence similarity for several hundreds of transcripts of potential interest is now available. This report demonstrates, that the combination of the high-throughput transcriptome profiling technology SuperSAGE with one of the next-generation sequencing platforms allows deep insights into the first molecular reactions of a plant exposed to salinity. Cross validation with recent reports enriched the information about the salt stress dynamics of more than 9,000 chickpea ESTs, and enlarged their pool of alternative transcripts isoforms. As an example for the high resolution of the employed technology that we coin deepSuperSAGE, we demonstrate that ROS-scavenging and -generating pathways undergo strong global transcriptome changes in chickpea roots and nodules already 2 hours after onset of moderate salt stress (25 mM NaCl). Additionally, a set of more than 15 candidate transcripts are proposed to be potential components of the salt overly sensitive (SOS) pathway in chickpea. Newly identified transcript isoforms are potential targets for breeding novel cultivars with high salinity tolerance. We demonstrate that these targets can be integrated into breeding schemes by micro-arrays and RT-PCR assays downstream of the generation of 26 bp tags by SuperSAGE.

  20. De novo assembly and characterization of transcriptomes of early-stage fruit from two genotypes of Annona squamosa L. with contrast in seed number.

    PubMed

    Gupta, Yogesh; Pathak, Ashish K; Singh, Kashmir; Mantri, Shrikant S; Singh, Sudhir P; Tuli, Rakesh

    2015-02-14

    Annona squamosa L., a popular fruit tree, is the most widely cultivated species of the genus Annona. The lack of transcriptomic and genomic information limits the scope of genome investigations in this important shrub. It bears aggregate fruits with numerous seeds. A few rare accessions with very few seeds have been reported for Annona. A massive pyrosequencing (Roche, 454 GS FLX+) of transcriptome from early stages of fruit development (0, 4, 8 and 12 days after pollination) was performed to produce expression datasets in two genotypes, Sitaphal and NMK-1, that show a contrast in the number of seeds set in fruits. The data reported here is the first source of genome-wide differential transcriptome sequence in two genotypes of A. squamosa, and identifies several candidate genes related to seed development. Approximately 1.9 million high-quality clean reads were obtained in the cDNA library from the developing fruits of both the genotypes, with an average length of about 568 bp. Quality-reads were assembled de novo into 2074 to 11004 contigs in the developing fruit samples at different stages of development. The contig sequence data of all the four stages of each genotype were combined into larger units resulting into 14921 (Sitaphal) and 14178 (NMK-1) unigenes, with a mean size of more than 1 Kb. Assembled unigenes were functionally annotated by querying against the protein sequences of five different public databases (NCBI non redundant, Prunus persica, Vitis vinifera, Fragaria vesca, and Amborella trichopoda), with an E-value cut-off of 10(-5). A total of 4588 (Sitaphal) and 2502 (NMK-1) unigenes did not match any known protein in the NR database. These sequences could be genes specific to Annona sp. or belong to untranslated regions. Several of the unigenes representing pathways related to primary and secondary metabolism, and seed and fruit development expressed at a higher level in Sitaphal, the densely seeded cultivar in comparison to the poorly seeded NMK-1. A total of 2629 (Sitaphal) and 3445 (NMK-1) Simple Sequence Repeat (SSR) motifs were identified respectively in the two genotypes. These could be potential candidates for transcript based microsatellite analysis in A. squamosa. The present work provides early-stage fruit specific transcriptome sequence resource for A. squamosa. This repository will serve as a useful resource for investigating the molecular mechanisms of fruit development, and improvement of fruit related traits in A. squamosa and related species.

  1. Single-cell RNA-seq of rheumatoid arthritis synovial tissue using low-cost microfluidic instrumentation.

    PubMed

    Stephenson, William; Donlin, Laura T; Butler, Andrew; Rozo, Cristina; Bracken, Bernadette; Rashidfarrokhi, Ali; Goodman, Susan M; Ivashkiv, Lionel B; Bykerk, Vivian P; Orange, Dana E; Darnell, Robert B; Swerdlow, Harold P; Satija, Rahul

    2018-02-23

    Droplet-based single-cell RNA-seq has emerged as a powerful technique for massively parallel cellular profiling. While this approach offers the exciting promise to deconvolute cellular heterogeneity in diseased tissues, the lack of cost-effective and user-friendly instrumentation has hindered widespread adoption of droplet microfluidic techniques. To address this, we developed a 3D-printed, low-cost droplet microfluidic control instrument and deploy it in a clinical environment to perform single-cell transcriptome profiling of disaggregated synovial tissue from five rheumatoid arthritis patients. We sequence 20,387 single cells revealing 13 transcriptomically distinct clusters. These encompass an unsupervised draft atlas of the autoimmune infiltrate that contribute to disease biology. Additionally, we identify previously uncharacterized fibroblast subpopulations and discern their spatial location within the synovium. We envision that this instrument will have broad utility in both research and clinical settings, enabling low-cost and routine application of microfluidic techniques.

  2. Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon.

    PubMed

    Vidotto, Michele; Grapputo, Alessandro; Boscari, Elisa; Barbisan, Federica; Coppe, Alessandro; Grandi, Gilberto; Kumar, Abhishek; Congiu, Leonardo

    2013-06-18

    Sturgeons are a group of Condrostean fish with very high evolutionary, economical and conservation interest. The eggs of these living fossils represent one of the most high prized foods of animal origin. The intense fishing pressure on wild stocks to harvest caviar has caused in the last decades a dramatic decline of their distribution and abundance leading the International Union for Conservation of Nature to list them as the more endangered group of species. As a direct consequence, world-wide efforts have been made to develop sturgeon aquaculture programmes for caviar production. In this context, the characterization of the genes involved in sex determination could provide relevant information for the selective farming of the more profitable females. The 454 sequencing of two cDNA libraries from the gonads and brain of one male and one female full-sib A. naccarii, yielded 182,066 and 167,776 reads respectively, which, after strict quality control, were iterative assembled into more than 55,000 high quality ESTs. The average per-base coverage reached by assembling the two libraries was 4X. The multi-step annotation process resulted in 16% successfully annotated sequences with GO terms. We screened the transcriptome for 32 sex-related genes and highlighted 7 genes that are potentially specifically expressed, 5 in male and 2 in females, at the first life stage at which sex is histologically identifiable. In addition we identified 21,791 putative EST-linked SNPs and 5,295 SSRs. This study represents the first large massive release of sturgeon transcriptome information that we organized into the public database AnaccariiBase, which is freely available at http://compgen.bio.unipd.it/anaccariibase/. This transcriptomic data represents an important source of information for further studies on sturgeon species. The hundreds of putative EST-linked molecular makers discovered in this study will be invaluable for sturgeon reintroduction and breeding programs.

  3. Transcriptome sequencing and de novo annotation of the critically endangered Adriatic sturgeon

    PubMed Central

    2013-01-01

    Background Sturgeons are a group of Condrostean fish with very high evolutionary, economical and conservation interest. The eggs of these living fossils represent one of the most high prized foods of animal origin. The intense fishing pressure on wild stocks to harvest caviar has caused in the last decades a dramatic decline of their distribution and abundance leading the International Union for Conservation of Nature to list them as the more endangered group of species. As a direct consequence, world-wide efforts have been made to develop sturgeon aquaculture programmes for caviar production. In this context, the characterization of the genes involved in sex determination could provide relevant information for the selective farming of the more profitable females. Results The 454 sequencing of two cDNA libraries from the gonads and brain of one male and one female full-sib A. naccarii, yielded 182,066 and 167,776 reads respectively, which, after strict quality control, were iterative assembled into more than 55,000 high quality ESTs. The average per-base coverage reached by assembling the two libraries was 4X. The multi-step annotation process resulted in 16% successfully annotated sequences with GO terms. We screened the transcriptome for 32 sex-related genes and highlighted 7 genes that are potentially specifically expressed, 5 in male and 2 in females, at the first life stage at which sex is histologically identifiable. In addition we identified 21,791 putative EST-linked SNPs and 5,295 SSRs. Conclusions This study represents the first large massive release of sturgeon transcriptome information that we organized into the public database AnaccariiBase, which is freely available at http://compgen.bio.unipd.it/anaccariibase/. This transcriptomic data represents an important source of information for further studies on sturgeon species. The hundreds of putative EST-linked molecular makers discovered in this study will be invaluable for sturgeon reintroduction and breeding programs. PMID:23773438

  4. Getting the most out of parasitic helminth transcriptomes using HelmDB: implications for biology and biotechnology.

    PubMed

    Mangiola, Stefano; Young, Neil D; Korhonen, Pasi; Mondal, Alinda; Scheerlinck, Jean-Pierre; Sternberg, Paul W; Cantacessi, Cinzia; Hall, Ross S; Jex, Aaron R; Gasser, Robin B

    2013-12-01

    Compounded by a massive global food shortage, many parasitic diseases have a devastating, long-term impact on animal and human health and welfare worldwide. Parasitic helminths (worms) affect the health of billions of animals. Unlocking the systems biology of these neglected pathogens will underpin the design of new and improved interventions against them. Currently, the functional annotation of genomic and transcriptomic sequence data for socio-economically important parasitic worms relies almost exclusively on comparative bioinformatic analyses using model organism- and other databases. However, many genes and gene products of parasitic helminths (often >50%) cannot be annotated using this approach, because they are specific to parasites and/or do not have identifiable homologs in other organisms for which sequence data are available. This inability to fully annotate transcriptomes and predicted proteomes is a major challenge and constrains our understanding of the biology of parasites, interactions with their hosts and of parasitism and the pathogenesis of disease on a molecular level. In the present article, we compiled transcriptomic data sets of key, socioeconomically important parasitic helminths, and constructed and validated a curated database, called HelmDB (www.helmdb.org). We demonstrate how this database can be used effectively for the improvement of functional annotation by employing data integration and clustering. Importantly, HelmDB provides a practical and user-friendly toolkit for sequence browsing and comparative analyses among divergent helminth groups (including nematodes and trematodes), and should be readily adaptable and applicable to a wide range of other organisms. This web-based, integrative database should assist 'systems biology' studies of parasitic helminths, and the discovery and prioritization of novel drug and vaccine targets. This focus provides a pathway toward developing new and improved approaches for the treatment and control of parasitic diseases, with the potential for important biotechnological outcomes. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-07-20

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Transcriptome-Wide Changes in Chlamydomonas reinhardtii Gene Expression Regulated by Carbon Dioxide and the CO2-Concentrating Mechanism Regulator CIA5/CCM1[W][OA

    PubMed Central

    Fang, Wei; Si, Yaqing; Douglass, Stephen; Casero, David; Merchant, Sabeeha S.; Pellegrini, Matteo; Ladunga, Istvan; Liu, Peng; Spalding, Martin H.

    2012-01-01

    We used RNA sequencing to query the Chlamydomonas reinhardtii transcriptome for regulation by CO2 and by the transcription regulator CIA5 (CCM1). Both CO2 and CIA5 are known to play roles in acclimation to low CO2 and in induction of an essential CO2-concentrating mechanism (CCM), but less is known about their interaction and impact on the whole transcriptome. Our comparison of the transcriptome of a wild type versus a cia5 mutant strain under three different CO2 conditions, high CO2 (5%), low CO2 (0.03 to 0.05%), and very low CO2 (<0.02%), provided an entry into global changes in the gene expression patterns occurring in response to the interaction between CO2 and CIA5. We observed a massive impact of CIA5 and CO2 on the transcriptome, affecting almost 25% of all Chlamydomonas genes, and we discovered an array of gene clusters with distinctive expression patterns that provide insight into the regulatory interaction between CIA5 and CO2. Several individual clusters respond primarily to either CIA5 or CO2, providing access to genes regulated by one factor but decoupled from the other. Three distinct clusters clearly associated with CCM-related genes may represent a rich source of candidates for new CCM components, including a small cluster of genes encoding putative inorganic carbon transporters. PMID:22634760

  7. Analysis of porcine adipose tissue transcriptome reveals differences in de novo fatty acid synthesis in pigs with divergent muscle fatty acid composition.

    PubMed

    Corominas, Jordi; Ramayo-Caldas, Yuliaxis; Puig-Oliveras, Anna; Estellé, Jordi; Castelló, Anna; Alves, Estefania; Pena, Ramona N; Ballester, Maria; Folch, Josep M

    2013-12-01

    In pigs, adipose tissue is one of the principal organs involved in the regulation of lipid metabolism. It is particularly involved in the overall fatty acid synthesis with consequences in other lipid-target organs such as muscles and the liver. With this in mind, we have used massive, parallel high-throughput sequencing technologies to characterize the porcine adipose tissue transcriptome architecture in six Iberian x Landrace crossbred pigs showing extreme phenotypes for intramuscular fatty acid composition (three per group). High-throughput RNA sequencing was used to generate a whole characterization of adipose tissue (backfat) transcriptome. A total of 4,130 putative unannotated protein-coding sequences were identified in the 20% of reads which mapped in intergenic regions. Furthermore, 36% of the unmapped reads were represented by interspersed repeats, SINEs being the most abundant elements. Differential expression analyses identified 396 candidate genes among divergent animals for intramuscular fatty acid composition. Sixty-two percent of these genes (247/396) presented higher expression in the group of pigs with higher content of intramuscular SFA and MUFA, while the remaining 149 showed higher expression in the group with higher content of PUFA. Pathway analysis related these genes to biological functions and canonical pathways controlling lipid and fatty acid metabolisms. In concordance with the phenotypic classification of animals, the major metabolic pathway differentially modulated between groups was de novo lipogenesis, the group with more PUFA being the one that showed lower expression of lipogenic genes. These results will help in the identification of genetic variants at loci that affect fatty acid composition traits. The implications of these results range from the improvement of porcine meat quality traits to the application of the pig as an animal model of human metabolic diseases.

  8. In-depth characterization of the microRNA transcriptome in a leukemia progression model

    PubMed Central

    Kuchenbauer, Florian; Morin, Ryan D.; Argiropoulos, Bob; Petriv, Oleh I.; Griffith, Malachi; Heuser, Michael; Yung, Eric; Piper, Jessica; Delaney, Allen; Prabhu, Anna-Liisa; Zhao, Yongjun; McDonald, Helen; Zeng, Thomas; Hirst, Martin; Hansen, Carl L.; Marra, Marco A.; Humphries, R. Keith

    2008-01-01

    MicroRNAs (miRNAs) have been shown to play important roles in physiological as well as multiple malignant processes, including acute myeloid leukemia (AML). In an effort to gain further insight into the role of miRNAs in AML, we have applied the Illumina massively parallel sequencing platform to carry out an in-depth analysis of the miRNA transcriptome in a murine leukemia progression model. This model simulates the stepwise conversion of a myeloid progenitor cell by an engineered overexpression of the nucleoporin 98 (NUP98)–homeobox HOXD13 fusion gene (ND13), to aggressive AML inducing cells upon transduction with the oncogenic collaborator Meis1. From this data set, we identified 307 miRNA/miRNA* species in the ND13 cells and 306 miRNA/miRNA* species in ND13+Meis1 cells, corresponding to 223 and 219 miRNA genes. Sequence counts varied between two and 136,558, indicating a remarkable expression range between the detected miRNA species. The large number of miRNAs expressed and the nature of differential expression suggest that leukemic progression as modeled here is dictated by the repertoire of shared, but differentially expressed miRNAs. Our finding of extensive sequence variations (isomiRs) for almost all miRNA and miRNA* species adds additional complexity to the miRNA transcriptome. A stringent target prediction analysis coupled with in vitro target validation revealed the potential for miRNA-mediated release of oncogenes that facilitates leukemic progression from the preleukemic to leukemia inducing state. Finally, 55 novel miRNAs species were identified in our data set, adding further complexity to the emerging world of small RNAs. PMID:18849523

  9. Deep Sequencing of the Medicago truncatula Root Transcriptome Reveals a Massive and Early Interaction between Nodulation Factor and Ethylene Signals1[OPEN

    PubMed Central

    Larrainzar, Estíbaliz; Riely, Brendan K.; Kim, Sang Cheol; Carrasquilla-Garcia, Noelia; Yu, Hee-Ju; Hwang, Hyun-Ju; Oh, Mijin; Kim, Goon Bo; Surendrarao, Anandkumar K.; Chasman, Deborah; Siahpirani, Alireza F.; Penmetsa, Ramachandra V.; Lee, Gang-Seob; Kim, Namshin; Roy, Sushmita; Mun, Jeong-Hwan; Cook, Douglas R.

    2015-01-01

    The legume-rhizobium symbiosis is initiated through the activation of the Nodulation (Nod) factor-signaling cascade, leading to a rapid reprogramming of host cell developmental pathways. In this work, we combine transcriptome sequencing with molecular genetics and network analysis to quantify and categorize the transcriptional changes occurring in roots of Medicago truncatula from minutes to days after inoculation with Sinorhizobium medicae. To identify the nature of the inductive and regulatory cues, we employed mutants with absent or decreased Nod factor sensitivities (i.e. Nodulation factor perception and Lysine motif domain-containing receptor-like kinase3, respectively) and an ethylene (ET)-insensitive, Nod factor-hypersensitive mutant (sickle). This unique data set encompasses nine time points, allowing observation of the symbiotic regulation of diverse biological processes with high temporal resolution. Among the many outputs of the study is the early Nod factor-induced, ET-regulated expression of ET signaling and biosynthesis genes. Coupled with the observation of massive transcriptional derepression in the ET-insensitive background, these results suggest that Nod factor signaling activates ET production to attenuate its own signal. Promoter:β-glucuronidase fusions report ET biosynthesis both in root hairs responding to rhizobium as well as in meristematic tissue during nodule organogenesis and growth, indicating that ET signaling functions at multiple developmental stages during symbiosis. In addition, we identified thousands of novel candidate genes undergoing Nod factor-dependent, ET-regulated expression. We leveraged the power of this large data set to model Nod factor- and ET-regulated signaling networks using MERLIN, a regulatory network inference algorithm. These analyses predict key nodes regulating the biological process impacted by Nod factor perception. We have made these results available to the research community through a searchable online resource. PMID:26175514

  10. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile, and accurate RNA structure analysis

    PubMed Central

    Smola, Matthew J.; Rice, Greggory M.; Busan, Steven; Siegfried, Nathan A.; Weeks, Kevin M.

    2016-01-01

    SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures. PMID:26426499

  11. High-throughput SNP genotyping in Cucurbita pepo for map construction and quantitative trait loci mapping

    PubMed Central

    2012-01-01

    Background Cucurbita pepo is a member of the Cucurbitaceae family, the second- most important horticultural family in terms of economic importance after Solanaceae. The "summer squash" types, including Zucchini and Scallop, rank among the highest-valued vegetables worldwide. There are few genomic tools available for this species. The first Cucurbita transcriptome, along with a large collection of Single Nucleotide Polymorphisms (SNP), was recently generated using massive sequencing. A set of 384 SNP was selected to generate an Illumina GoldenGate assay in order to construct the first SNP-based genetic map of Cucurbita and map quantitative trait loci (QTL). Results We herein present the construction of the first SNP-based genetic map of Cucurbita pepo using a population derived from the cross of two varieties with contrasting phenotypes, representing the main cultivar groups of the species' two subspecies: Zucchini (subsp. pepo) × Scallop (subsp. ovifera). The mapping population was genotyped with 384 SNP, a set of selected EST-SNP identified in silico after massive sequencing of the transcriptomes of both parents, using the Illumina GoldenGate platform. The global success rate of the assay was higher than 85%. In total, 304 SNP were mapped, along with 11 SSR from a previous map, giving a map density of 5.56 cM/marker. This map was used to infer syntenic relationships between C. pepo and cucumber and to successfully map QTL that control plant, flowering and fruit traits that are of benefit to squash breeding. The QTL effects were validated in backcross populations. Conclusion Our results show that massive sequencing in different genotypes is an excellent tool for SNP discovery, and that the Illumina GoldenGate platform can be successfully applied to constructing genetic maps and performing QTL analysis in Cucurbita. This is the first SNP-based genetic map in the Cucurbita genus and is an invaluable new tool for biological research, especially considering that most of these markers are located in the coding regions of genes involved in different physiological processes. The platform will also be useful for future mapping and diversity studies, and will be essential in order to accelerate the process of breeding new and better-adapted squash varieties. PMID:22356647

  12. iSeq: Web-Based RNA-seq Data Analysis and Visualization.

    PubMed

    Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng

    2018-01-01

    Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .

  13. Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq

    PubMed Central

    Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim

    2014-01-01

    The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.

  14. Insights into transcriptomes of Big and Low sagebrush

    Treesearch

    Mark D. Huynh; Justin T. Page; Bryce A. Richardson; Joshua A. Udall

    2015-01-01

    We report the sequencing and assembly of three transcriptomes from Big (Artemisia tridentatassp. wyomingensis and A. tridentatassp. tridentata) and Low (A. arbuscula ssp. arbuscula) sagebrush. The sequence reads are available in the Sequence Read Archive of NCBI. We demonstrate the utilities of these transcriptomes for gene discovery and phylogenomic analysis. An...

  15. Spinning Gland Transcriptomics from Two Main Clades of Spiders (Order: Araneae) - Insights on Their Molecular, Anatomical and Behavioral Evolution

    PubMed Central

    Prosdocimi, Francisco; Bittencourt, Daniela; da Silva, Felipe Rodrigues; Kirst, Matias; Motta, Paulo C.; Rech, Elibio L.

    2011-01-01

    Characterized by distinctive evolutionary adaptations, spiders provide a comprehensive system for evolutionary and developmental studies of anatomical organs, including silk and venom production. Here we performed cDNA sequencing using massively parallel sequencers (454 GS-FLX Titanium) to generate ∼80,000 reads from the spinning gland of Actinopus spp. (infraorder: Mygalomorphae) and Gasteracantha cancriformis (infraorder: Araneomorphae, Orbiculariae clade). Actinopus spp. retains primitive characteristics on web usage and presents a single undifferentiated spinning gland while the orbiculariae spiders have seven differentiated spinning glands and complex patterns of web usage. MIRA, Celera Assembler and CAP3 software were used to cluster NGS reads for each spider. CAP3 unigenes passed through a pipeline for automatic annotation, classification by biological function, and comparative transcriptomics. Genes related to spider silks were manually curated and analyzed. Although a single spidroin gene family was found in Actinopus spp., a vast repertoire of specialized spider silk proteins was encountered in orbiculariae. Astacin-like metalloproteases (meprin subfamily) were shown to be some of the most sampled unigenes and duplicated gene families in G. cancriformis since its evolutionary split from mygalomorphs. Our results confirm that the evolution of the molecular repertoire of silk proteins was accompanied by the (i) anatomical differentiation of spinning glands and (ii) behavioral complexification in the web usage. Finally, a phylogenetic tree was constructed to cluster most of the known spidroins in gene clades. This is the first large-scale, multi-organism transcriptome for spider spinning glands and a first step into a broad understanding of spider web systems biology and evolution. PMID:21738742

  16. Genome Evolution and Meiotic Maps by Massively Parallel DNA Sequencing: Spotted Gar, an Outgroup for the Teleost Genome Duplication

    PubMed Central

    Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H.

    2011-01-01

    Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F1 offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F1 dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing. PMID:21828280

  17. The first genome-level transcriptome of the wood-degrading fungus Phanerochaete chrysosporium grown on red oak.

    PubMed

    Sato, Shin; Feltus, F Alex; Iyer, Prashanti; Tien, Ming

    2009-06-01

    As part of an effort to determine all the gene products involved in wood degradation, we have performed massively parallel pyrosequencing on an expression library from the white rot fungus Phanerochaete chrysosporium grown in shallow stationary cultures with red oak as the carbon source. Approximately 48,000 high quality sequence tags (246 bp average length) were generated. 53% of the sequence tags aligned to 4,262 P. chrysosporium gene models, and an additional 18.5% of the tags reliably aligned to the P. chrysosporium genome providing evidence for 961 putative novel fragmented gene models. Due to their role in lignocellulose degradation, the secreted proteins were focused upon. Our results show that the four enzymes required for cellulose degradation: endocellulase, exocellulase CBHI, exocellulase CBHII, and beta-glucosidase are all produced. For hemicellulose degradation, not all known enzymes were produced, but endoxylanases, acetyl xylan esterases and mannosidases were detected. For lignin degradation, the role of peroxidases has been questioned; however, our results show that lignin peroxidase is highly expressed along with the H(2)O(2) generating enzyme, alcohol oxidase. The transcriptome snapshot reveals that H(2)O(2) generation and utilization are central in wood degradation. Our results also reveal new transcripts that encode extracellular proteins with no known function.

  18. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.

    PubMed

    Gan, Ruei-Chi; Chen, Ting-Wen; Wu, Timothy H; Huang, Po-Jung; Lee, Chi-Ching; Yeh, Yuan-Ming; Chiu, Cheng-Hsun; Huang, Hsien-Da; Tang, Petrus

    2016-12-22

    Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .

  19. New in-depth rainbow trout transcriptome reference and digital atlas of gene expression

    USDA-ARS?s Scientific Manuscript database

    Sequencing the rainbow trout genome is underway and a transcriptome reference sequence is required to help in genome assembly and gene discovery. Previously, we reported a transcriptome reference sequence using a 19X coverage of 454-pyrosequencing data. Although this work added a great wealth of ann...

  20. Genomewide analysis of Drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation

    PubMed Central

    Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.

    2014-01-01

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350

  1. Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

    DOE PAGES

    Westholm, Jakub  O.; Miura, Pedro; Olson, Sara; ...

    2014-11-26

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less

  2. Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Westholm, Jakub  O.; Miura, Pedro; Olson, Sara

    Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less

  3. Arkas: Rapid reproducible RNAseq analysis

    PubMed Central

    Colombo, Anthony R.; J. Triche Jr, Timothy; Ramsingh, Giridharan

    2017-01-01

    The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments.  We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways .  Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing.   Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import.  Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps. PMID:28868134

  4. The Embryonic Transcriptome of the Red-Eared Slider Turtle (Trachemys scripta)

    PubMed Central

    Kaplinsky, Nicholas J.; Gilbert, Scott F.; Cebra-Thomas, Judith; Lilleväli, Kersti; Saare, Merly; Chang, Eric Y.; Edelman, Hannah E.; Frick, Melissa A.; Guan, Yin; Hammond, Rebecca M.; Hampilos, Nicholas H.; Opoku, David S. B.; Sariahmed, Karim; Sherman, Eric A.; Watson, Ray

    2013-01-01

    The bony shell of the turtle is an evolutionary novelty not found in any other group of animals, however, research into its formation has suggested that it has evolved through modification of conserved developmental mechanisms. Although these mechanisms have been extensively characterized in model organisms, the tools for characterizing them in non-model organisms such as turtles have been limited by a lack of genomic resources. We have used a next generation sequencing approach to generate and assemble a transcriptome from stage 14 and 17 Trachemys scripta embryos, stages during which important events in shell development are known to take place. The transcriptome consists of 231,876 sequences with an N50 of 1,166 bp. GO terms and EC codes were assigned to the 61,643 unique predicted proteins identified in the transcriptome sequences. All major GO categories and metabolic pathways are represented in the transcriptome. Transcriptome sequences were used to amplify several cDNA fragments designed for use as RNA in situ probes. One of these, BMP5, was hybridized to a T. scripta embryo and exhibits both conserved and novel expression patterns. The transcriptome sequences should be of broad use for understanding the evolution and development of the turtle shell and for annotating any future T. scripta genome sequences. PMID:23840449

  5. Researches on Transcriptome Sequencing in the Study of Traditional Chinese Medicine

    PubMed Central

    Xin, Jie; Zhang, Rong-chao; Wang, Lei

    2017-01-01

    Due to its incomparable advantages, the application of transcriptome sequencing in the study of traditional Chinese medicine attracts more and more attention of researchers, which greatly promote the development of traditional Chinese medicine. In this paper, the applications of transcriptome sequencing in traditional Chinese medicine were summarized by reviewing recent related papers. PMID:28900463

  6. Genetic divergence between freshwater and marine morphs of alewife (Alosa pseudoharengus): a 'next-generation' sequencing analysis.

    PubMed

    Czesny, Sergiusz; Epifanio, John; Michalak, Pawel

    2012-01-01

    Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes.

  7. Genetic Divergence between Freshwater and Marine Morphs of Alewife (Alosa pseudoharengus): A ‘Next-Generation’ Sequencing Analysis

    PubMed Central

    Czesny, Sergiusz; Epifanio, John; Michalak, Pawel

    2012-01-01

    Alewife Alosa pseudoharengus, a small clupeid fish native to Atlantic Ocean, has recently (∼150 years ago) invaded the North American Great Lakes and despite challenges of freshwater environment its populations exploded and disrupted local food web structures. This range expansion has been accompanied by dramatic changes at all levels of organization. Growth rates, size at maturation, or fecundity are only a few of the most distinct morphological and life history traits that contrast the two alewife morphs. A question arises to what extent these rapidly evolving differences between marine and freshwater varieties result from regulatory (including phenotypic plasticity) or structural mutations. To gain insights into expression changes and sequence divergence between marine and freshwater alewives, we sequenced transcriptomes of individuals from Lake Michigan and Atlantic Ocean. Population specific single nucleotide polymorphisms were rare but interestingly occurred in sequences of genes that also tended to show large differences in expression. Our results show that the striking phenotypic divergence between anadromous and lake alewives can be attributed to massive regulatory modifications rather than coding changes. PMID:22438868

  8. Comparative 454 pyrosequencing of transcripts from two olive genotypes during fruit development

    PubMed Central

    Alagna, Fiammetta; D'Agostino, Nunzio; Torchia, Laura; Servili, Maurizio; Rao, Rosa; Pietrella, Marco; Giuliano, Giovanni; Chiusano, Maria Luisa; Baldoni, Luciana; Perrotta, Gaetano

    2009-01-01

    Background Despite its primary economic importance, genomic information on olive tree is still lacking. 454 pyrosequencing was used to enrich the very few sequence data currently available for the Olea europaea species and to identify genes involved in expression of fruit quality traits. Results Fruits of Coratina, a widely cultivated variety characterized by a very high phenolic content, and Tendellone, an oleuropein-lacking natural variant, were used as starting material for monitoring the transcriptome. Four different cDNA libraries were sequenced, respectively at the beginning and at the end of drupe development. A total of 261,485 reads were obtained, for an output of about 58 Mb. Raw sequence data were processed using a four step pipeline procedure and data were stored in a relational database with a web interface. Conclusion Massively parallel sequencing of different fruit cDNA collections has provided large scale information about the structure and putative function of gene transcripts accumulated during fruit development. Comparative transcript profiling allowed the identification of differentially expressed genes with potential relevance in regulating the fruit metabolism and phenolic content during ripening. PMID:19709400

  9. Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly.

    PubMed

    Garcia, T I; Shen, Y; Catchen, J; Amores, A; Schartl, M; Postlethwait, J; Walter, R B

    2012-01-01

    For many researchers, next generation sequencing data holds the key to answering a category of questions previously unassailable. One of the important and challenging steps in achieving these goals is accurately assembling the massive quantity of short sequencing reads into full nucleic acid sequences. For research groups working with non-model or wild systems, short read assembly can pose a significant challenge due to the lack of pre-existing EST or genome reference libraries. While many publications describe the overall process of sequencing and assembly, few address the topic of how many and what types of reads are best for assembly. The goal of this project was use real world data to explore the effects of read quantity and short read quality scores on the resulting de novo assemblies. Using several samples of short reads of various sizes and qualities we produced many assemblies in an automated manner. We observe how the properties of read length, read quality, and read quantity affect the resulting assemblies and provide some general recommendations based on our real-world data set. Published by Elsevier Inc.

  10. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  11. Next-Generation Sequencing: The Translational Medicine Approach from “Bench to Bedside to Population”

    PubMed Central

    Beigh, Mohammad Muzafar

    2016-01-01

    Humans have predicted the relationship between heredity and diseases for a long time. Only in the beginning of the last century, scientists begin to discover the connotations between different genes and disease phenotypes. Recent trends in next-generation sequencing (NGS) technologies have brought a great momentum in biomedical research that in turn has remarkably augmented our basic understanding of human biology and its associated diseases. State-of-the-art next generation biotechnologies have started making huge strides in our current understanding of mechanisms of various chronic illnesses like cancers, metabolic disorders, neurodegenerative anomalies, etc. We are experiencing a renaissance in biomedical research primarily driven by next generation biotechnologies like genomics, transcriptomics, proteomics, metabolomics, lipidomics etc. Although genomic discoveries are at the forefront of next generation omics technologies, however, their implementation into clinical arena had been painstakingly slow mainly because of high reaction costs and unavailability of requisite computational tools for large-scale data analysis. However rapid innovations and steadily lowering cost of sequence-based chemistries along with the development of advanced bioinformatics tools have lately prompted launching and implementation of large-scale massively parallel genome sequencing programs in different fields ranging from medical genetics, infectious biology, agriculture sciences etc. Recent advances in large-scale omics-technologies is bringing healthcare research beyond the traditional “bench to bedside” approach to more of a continuum that will include improvements, in public healthcare and will be primarily based on predictive, preventive, personalized, and participatory medicine approach (P4). Recent large-scale research projects in genetic and infectious disease biology have indicated that massively parallel whole-genome/whole-exome sequencing, transcriptome analysis, and other functional genomic tools can reveal large number of unique functional elements and/or markers that otherwise would be undetected by traditional sequencing methodologies. Therefore, latest trends in the biomedical research is giving birth to the new branch in medicine commonly referred to as personalized and/or precision medicine. Developments in the post-genomic era are believed to completely restructure the present clinical pattern of disease prevention and treatment as well as methods of diagnosis and prognosis. The next important step in the direction of the precision/personalized medicine approach should be its early adoption in clinics for future medical interventions. Consequently, in coming year’s next generation biotechnologies will reorient medical practice more towards disease prediction and prevention approaches rather than curing them at later stages of their development and progression, even at wider population level(s) for general public healthcare system. PMID:28930123

  12. Genomics of Adaptation to Multiple Concurrent Stresses: Insights from Comparative Transcriptomics of a Cichlid Fish from One of Earth's Most Extreme Environments, the Hypersaline Soda Lake Magadi in Kenya, East Africa.

    PubMed

    Kavembe, Geraldine D; Franchini, Paolo; Irisarri, Iker; Machado-Schiaffino, Gonzalo; Meyer, Axel

    2015-10-01

    The Magadi tilapia (Alcolapia grahami) is a cichlid fish that inhabits one of the Earth's most extreme aquatic environments, with high pH (~10), salinity (~60% of seawater), high temperatures (~40 °C), and fluctuating oxygen regimes. The Magadi tilapia evolved several unique behavioral, physiological, and anatomical adaptations, some of which are constituent and thus retained in freshwater conditions. We conducted a transcriptomic analysis on A. grahami to study the evolutionary basis of tolerance to multiple stressors. To identify the adaptive regulatory changes associated with stress responses, we massively sequenced gill transcriptomes (RNAseq) from wild and freshwater-acclimated specimens of A. grahami. As a control, corresponding transcriptome data from Oreochromis leucostictus, a closely related freshwater species, were generated. We found expression differences in a large number of genes with known functions related to osmoregulation, energy metabolism, ion transport, and chemical detoxification. Over-representation of metabolism-related gene ontology terms in wild individuals compared to laboratory-acclimated specimens suggested that freshwater conditions greatly decrease the metabolic requirements of this species. Twenty-five genes with diverse physiological functions related to responses to water stress showed signs of divergent natural selection between the Magadi tilapia and its freshwater relative, which shared a most recent common ancestor only about four million years ago. The complete set of genes responsible for urea excretion was identified in the gill transcriptome of A. grahami, making it the only fish species to have a functional ornithine-urea cycle pathway in the gills--a major innovation for increasing nitrogenous waste efficiency.

  13. Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

    PubMed

    Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami

    2018-01-19

    Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .

  14. Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

    PubMed Central

    Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088

  15. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

    PubMed

    Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

  16. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  17. A survey of the sorghum transcriptome using single-molecule long reads

    DOE PAGES

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

    2016-06-24

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less

  18. A survey of the sorghum transcriptome using single-molecule long reads

    PubMed Central

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

    2016-01-01

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290

  19. Sympatric ecological speciation meets pyrosequencing: sampling the transcriptome of the apple maggot Rhagoletis pomonella

    PubMed Central

    2009-01-01

    Background The full power of modern genetics has been applied to the study of speciation in only a small handful of genetic model species - all of which speciated allopatrically. Here we report the first large expressed sequence tag (EST) study of a candidate for ecological sympatric speciation, the apple maggot Rhagoletis pomonella, using massively parallel pyrosequencing on the Roche 454-FLX platform. To maximize transcript diversity we created and sequenced separate libraries from larvae, pupae, adult heads, and headless adult bodies. Results We obtained 239,531 sequences which assembled into 24,373 contigs. A total of 6810 unique protein coding genes were identified among the contigs and long singletons, corresponding to 48% of all known Drosophila melanogaster protein-coding genes. Their distribution across GO classes suggests that we have obtained a representative sample of the transcriptome. Among these sequences are many candidates for potential R. pomonella "speciation genes" (or "barrier genes") such as those controlling chemosensory and life-history timing processes. Furthermore, we identified important marker loci including more than 40,000 single nucleotide polymorphisms (SNPs) and over 100 microsatellites. An initial search for SNPs at which the apple and hawthorn host races differ suggested at least 75 loci warranting further work. We also determined that developmental expression differences remained even after normalization; transcripts expected to show different expression levels between larvae and pupae in D. melanogaster also did so in R. pomonella. Preliminary comparative analysis of transcript presences and absences revealed evidence of gene loss in Drosophila and gain in the higher dipteran clade Schizophora. Conclusions These data provide a much needed resource for exploring mechanisms of divergence in this important model for sympatric ecological speciation. Our description of ESTs from a substantial portion of the R. pomonella transcriptome will facilitate future functional studies of candidate genes for olfaction and diapause-related life history timing, and will enable large scale expression studies. Similarly, the identification of new SNP and microsatellite markers will facilitate future population and quantitative genetic studies of divergence between the apple and hawthorn-infesting host races. PMID:20035631

  20. System-Wide Hypersensitive Response-Associated Transcriptome and Metabolome Reprogramming in Tomato1[W][OA

    PubMed Central

    Etalo, Desalegn W.; Stulemeijer, Iris J.E.; Peter van Esse, H.; de Vos, Ric C.H.; Bouwmeester, Harro J.; Joosten, Matthieu H.A.J.

    2013-01-01

    The hypersensitive response (HR) is considered to be the hallmark of the resistance response of plants to pathogens. To study HR-associated transcriptome and metabolome reprogramming in tomato (Solanum lycopersicum), we used plants that express both a resistance gene to Cladosporium fulvum and the matching avirulence gene of this pathogen. In these plants, massive reprogramming occurred, and we found that the HR and associated processes are highly energy demanding. Ubiquitin-dependent protein degradation, hydrolysis of sugars, and lipid catabolism are used as alternative sources of amino acids, energy, and carbon skeletons, respectively. We observed strong accumulation of secondary metabolites, such as hydroxycinnamic acid amides. Coregulated expression of WRKY transcription factors and genes known to be involved in the HR, in addition to a strong enrichment of the W-box WRKY-binding motif in the promoter sequences of the coregulated genes, point to WRKYs as the most prominent orchestrators of the HR. Our study has revealed several novel HR-related genes, and reverse genetics tools will allow us to understand the role of each individual component in the HR. PMID:23719893

  1. Deep Sequencing of the Medicago truncatula Root Transcriptome Reveals a Massive and Early Interaction between Nodulation Factor and Ethylene Signals.

    PubMed

    Larrainzar, Estíbaliz; Riely, Brendan K; Kim, Sang Cheol; Carrasquilla-Garcia, Noelia; Yu, Hee-Ju; Hwang, Hyun-Ju; Oh, Mijin; Kim, Goon Bo; Surendrarao, Anandkumar K; Chasman, Deborah; Siahpirani, Alireza F; Penmetsa, Ramachandra V; Lee, Gang-Seob; Kim, Namshin; Roy, Sushmita; Mun, Jeong-Hwan; Cook, Douglas R

    2015-09-01

    The legume-rhizobium symbiosis is initiated through the activation of the Nodulation (Nod) factor-signaling cascade, leading to a rapid reprogramming of host cell developmental pathways. In this work, we combine transcriptome sequencing with molecular genetics and network analysis to quantify and categorize the transcriptional changes occurring in roots of Medicago truncatula from minutes to days after inoculation with Sinorhizobium medicae. To identify the nature of the inductive and regulatory cues, we employed mutants with absent or decreased Nod factor sensitivities (i.e. Nodulation factor perception and Lysine motif domain-containing receptor-like kinase3, respectively) and an ethylene (ET)-insensitive, Nod factor-hypersensitive mutant (sickle). This unique data set encompasses nine time points, allowing observation of the symbiotic regulation of diverse biological processes with high temporal resolution. Among the many outputs of the study is the early Nod factor-induced, ET-regulated expression of ET signaling and biosynthesis genes. Coupled with the observation of massive transcriptional derepression in the ET-insensitive background, these results suggest that Nod factor signaling activates ET production to attenuate its own signal. Promoter:β-glucuronidase fusions report ET biosynthesis both in root hairs responding to rhizobium as well as in meristematic tissue during nodule organogenesis and growth, indicating that ET signaling functions at multiple developmental stages during symbiosis. In addition, we identified thousands of novel candidate genes undergoing Nod factor-dependent, ET-regulated expression. We leveraged the power of this large data set to model Nod factor- and ET-regulated signaling networks using MERLIN, a regulatory network inference algorithm. These analyses predict key nodes regulating the biological process impacted by Nod factor perception. We have made these results available to the research community through a searchable online resource. © 2015 American Society of Plant Biologists. All Rights Reserved.

  2. Transcriptome assembly, gene annotation and tissue gene expression atlas of the rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complimented by transcriptome information that will enhance genome assembly and annotation. Previously, we reported a transcriptome reference sequence using a 19X coverage of Sanger and 454-pyrosequencing dat...

  3. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  4. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  5. Gene Expression Profiles in Paired Gingival Biopsies from Periodontitis-Affected and Healthy Tissues Revealed by Massively Parallel Sequencing

    PubMed Central

    Båge, Tove; Lagervall, Maria; Jansson, Leif; Lundeberg, Joakim; Yucel-Lindberg, Tülay

    2012-01-01

    Periodontitis is a chronic inflammatory disease affecting the soft tissue and bone that surrounds the teeth. Despite extensive research, distinctive genes responsible for the disease have not been identified. The objective of this study was to elucidate transcriptome changes in periodontitis, by investigating gene expression profiles in gingival tissue obtained from periodontitis-affected and healthy gingiva from the same patient, using RNA-sequencing. Gingival biopsies were obtained from a disease-affected and a healthy site from each of 10 individuals diagnosed with periodontitis. Enrichment analysis performed among uniquely expressed genes for the periodontitis-affected and healthy tissues revealed several regulated pathways indicative of inflammation for the periodontitis-affected condition. Hierarchical clustering of the sequenced biopsies demonstrated clustering according to the degree of inflammation, as observed histologically in the biopsies, rather than clustering at the individual level. Among the top 50 upregulated genes in periodontitis-affected tissues, we investigated two genes which have not previously been demonstrated to be involved in periodontitis. These included interferon regulatory factor 4 and chemokine (C-C motif) ligand 18, which were also expressed at the protein level in gingival biopsies from patients with periodontitis. In conclusion, this study provides a first step towards a quantitative comprehensive insight into the transcriptome changes in periodontitis. We demonstrate for the first time site-specific local variation in gene expression profiles of periodontitis-affected and healthy tissues obtained from patients with periodontitis, using RNA-seq. Further, we have identified novel genes expressed in periodontitis tissues, which may constitute potential therapeutic targets for future treatment strategies of periodontitis. PMID:23029519

  6. Assembled contigs of the synganglion transcriptome from an Australian population of the cattle tick, Rhipicephalus microplus

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources and University of Texas at El Paso researchers, we sequenced and assembled the transcriptome of the synganglion of the Texas strain (Deutsch) of the cattle tick Rhipicephalus microplus. This transcriptome contains 43, 468 sequences and wa...

  7. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    USDA-ARS?s Scientific Manuscript database

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  8. Assembled contigs of the synganglion transcriptome from a Texas population of the cattle tick, Rhipicephalus microplus.

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources and University of Texas at El Paso researchers, we sequenced and assembled the transcriptome of the synganglion of the Texas strain (Deutsch) of the cattle tick Rhipicephalus microplus. This transcriptome contains 43, 468 sequences and wa...

  9. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology

    PubMed Central

    Udy, Dylan B.; Voorhies, Mark; Chan, Patricia P.; Lowe, Todd M.; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes—and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics. PMID:26252667

  10. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology.

    PubMed

    Udy, Dylan B; Voorhies, Mark; Chan, Patricia P; Lowe, Todd M; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes-and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics.

  11. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    PubMed

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  12. Functional Analysis of Mating Type Genes and Transcriptome Analysis during Fruiting Body Development of Botrytis cinerea

    PubMed Central

    2018-01-01

    ABSTRACT Botrytis cinerea is a plant-pathogenic fungus producing apothecia as sexual fruiting bodies. To study the function of mating type (MAT) genes, single-gene deletion mutants were generated in both genes of the MAT1-1 locus and both genes of the MAT1-2 locus. Deletion mutants in two MAT genes were entirely sterile, while mutants in the other two MAT genes were able to develop stipes but never formed an apothecial disk. Little was known about the reprogramming of gene expression during apothecium development. We analyzed transcriptomes of sclerotia, three stages of apothecium development (primordia, stipes, and apothecial disks), and ascospores by RNA sequencing. Ten secondary metabolite gene clusters were upregulated at the onset of sexual development and downregulated in ascospores released from apothecia. Notably, more than 3,900 genes were differentially expressed in ascospores compared to mature apothecial disks. Among the genes that were upregulated in ascospores were numerous genes encoding virulence factors, which reveals that ascospores are transcriptionally primed for infection prior to their arrival on a host plant. Strikingly, the massive transcriptional changes at the initiation and completion of the sexual cycle often affected clusters of genes, rather than randomly dispersed genes. Thirty-five clusters of genes were jointly upregulated during the onset of sexual reproduction, while 99 clusters of genes (comprising >900 genes) were jointly downregulated in ascospores. These transcriptional changes coincided with changes in expression of genes encoding enzymes participating in chromatin organization, hinting at the occurrence of massive epigenetic regulation of gene expression during sexual reproduction. PMID:29440571

  13. Massive dysregulation of genes involved in cell signaling and placental development in cloned cattle conceptus and maternal endometrium

    PubMed Central

    Biase, Fernando H.; Rabel, Chanaka; Guillomot, Michel; Hue, Isabelle; Andropolis, Kalista; Olmstead, Colleen A.; Oliveira, Rosane; Wallace, Richard; Le Bourhis, Daniel; Richard, Christophe; Campion, Evelyne; Chaulot-Talmon, Aurélie; Giraud-Delville, Corinne; Taghouti, Géraldine; Jammes, Hélène; Renard, Jean-Paul; Sandra, Olivier; Lewin, Harris A.

    2016-01-01

    A major unresolved issue in the cloning of mammals by somatic cell nuclear transfer (SCNT) is the mechanism by which the process fails after embryos are transferred to the uterus of recipients before or during the implantation window. We investigated this problem by using RNA sequencing (RNA-seq) to compare the transcriptomes in cattle conceptuses produced by SCNT and artificial insemination (AI) at day (d) 18 (preimplantation) and d 34 (postimplantation) of gestation. In addition, endometrium was profiled to identify the communication pathways that might be affected by the presence of a cloned conceptus, ultimately leading to mortality before or during the implantation window. At d 18, the effects on the transcriptome associated with SCNT were massive, involving more than 5,000 differentially expressed genes (DEGs). Among them are 121 genes that have embryonic lethal phenotypes in mice, cause defects in trophoblast and placental development, and/or affect conceptus survival in mice. In endometria at d 18, <0.4% of expressed genes were affected by the presence of a cloned conceptus, whereas at d 34, ∼36% and <0.7% of genes were differentially expressed in intercaruncular and caruncular tissues, respectively. Functional analysis of DEGs in placental and endometrial tissues suggests a major disruption of signaling between the cloned conceptus and the endometrium, particularly the intercaruncular tissue. Our results support a “bottleneck” model for cloned conceptus survival during the periimplantation period determined by gene expression levels in extraembryonic tissues and the endometrial response to altered signaling from clones. PMID:27940919

  14. SNP Discovery in the Transcriptome of White Pacific Shrimp Litopenaeus vannamei by Next Generation Sequencing

    PubMed Central

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047

  15. New approach for the study of mite reproduction: the first transcriptome analysis of a mite, Phytoseiulus persimilis (Acari: Phytoseiidae)

    USDA-ARS?s Scientific Manuscript database

    Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yie...

  16. The Impact of Normalization Methods on RNA-Seq Data Analysis

    PubMed Central

    Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.

    2015-01-01

    High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014

  17. Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

    NASA Astrophysics Data System (ADS)

    Kikuchi, Shoshi

    2009-02-01

    Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.

  18. Reptilian Transcriptomes v2.0: An Extensive Resource for Sauropsida Genomics and Transcriptomics

    PubMed Central

    Tzika, Athanasia C.; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C.

    2015-01-01

    Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the “Reptilian Transcriptomes Database 2.0,” which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. PMID:26133641

  19. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome

    PubMed Central

    Kim, Gunjune

    2017-01-01

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is “leaves of three, let it be”, which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species. PMID:29125533

  20. Sequencing and De Novo Assembly of the Toxicodendron radicans (Poison Ivy) Transcriptome.

    PubMed

    Weisberg, Alexandra J; Kim, Gunjune; Westwood, James H; Jelesko, John G

    2017-11-10

    Contact with poison ivy plants is widely dreaded because they produce a natural product called urushiol that is responsible for allergenic contact delayed-dermatitis symptoms lasting for weeks. For this reason, the catchphrase most associated with poison ivy is "leaves of three, let it be", which serves the purpose of both identification and an appeal for avoidance. Ironically, despite this notoriety, there is a dearth of specific knowledge about nearly all other aspects of poison ivy physiology and ecology. As a means of gaining a more molecular-oriented understanding of poison ivy physiology and ecology, Next Generation DNA sequencing technology was used to develop poison ivy root and leaf RNA-seq transcriptome resources. De novo assembled transcriptomes were analyzed to generate a core set of high quality expressed transcripts present in poison ivy tissue. The predicted protein sequences were evaluated for similarity to SwissProt homologs and InterProScan domains, as well as assigned both GO terms and KEGG annotations. Over 23,000 simple sequence repeats were identified in the transcriptome, and corresponding oligo nucleotide primer pairs were designed. A pan-transcriptome analysis of existing Anacardiaceae transcriptomes revealed conserved and unique transcripts among these species.

  1. ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

    PubMed

    Zeng, Victor; Extavour, Cassandra G

    2012-01-01

    The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.

  2. Elucidating and mining the Tulipa and Lilium transcriptomes.

    PubMed

    Moreno-Pachon, Natalia M; Leeggangers, Hendrika A C F; Nijveen, Harm; Severing, Edouard; Hilhorst, Henk; Immink, Richard G H

    2016-10-01

    Genome sequencing remains a challenge for species with large and complex genomes containing extensive repetitive sequences, of which the bulbous and monocotyledonous plants tulip and lily are examples. In such a case, sequencing of only the active part of the genome, represented by the transcriptome, is a good alternative to obtain information about gene content. In this study we aimed to generate a high quality transcriptome of tulip and lily and to make this data available as an open-access resource via a user-friendly web-based interface. The Illumina HiSeq 2000 platform was applied and the transcribed RNA was sequenced from a collection of different lily and tulip tissues, respectively. In order to obtain good transcriptome coverage and to facilitate effective data mining, assembly was done using different filtering parameters for clearing out contamination and noise of the RNAseq datasets. This analysis revealed limitations of commonly applied methods and parameter settings used in de novo transcriptome assembly. The final created transcriptomes are publicly available via a user friendly Transcriptome browser ( http://www.bioinformatics.nl/bulbs/db/species/index ). The usefulness of this resource has been exemplified by a search for all potential transcription factors in lily and tulip, with special focus on the TCP transcription factor family. This analysis and other quality parameters point out the quality of the transcriptomes, which can serve as a basis for further genomics studies in lily, tulip, and bulbous plants in general.

  3. Transcriptome sequencing of diverse peanut (arachis) wild species and the cultivated species reveals a wealth of untapped genetic variability

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencing technologies and improved bioinformatics methods have provided opportunities to study sequence variability in complex polyploid transcriptomes. In this study, we used a diverse panel of twenty-two Arachis accessions representing seven Arachis hypogaea market classes, A-, B...

  4. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models.« less

  5. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. Massive sequencing of Ulmus minor’s transcriptome provides new molecular tools for a genus under the constant threat of Dutch elm disease

    PubMed Central

    Perdiguero, Pedro; Venturas, Martin; Cervera, María Teresa; Gil, Luis; Collada, Carmen

    2015-01-01

    Elms, especially Ulmus minor and U. americana, are carrying out a hard battle against Dutch elm disease (DED). This vascular wilt disease, caused by Ophiostoma ulmi and O. novo-ulmi, appeared in the twentieth century and killed millions of elms across North America and Europe. Elm breeding and conservation programmes have identified a reduced number of DED tolerant genotypes. In this study, three U. minor genotypes with contrasted levels of tolerance to DED were exposed to several biotic and abiotic stresses in order to (i) obtain a de novo assembled transcriptome of U. minor using 454 pyrosequencing, (ii) perform a functional annotation of the assembled transcriptome, (iii) identify genes potentially involved in the molecular response to environmental stress, and (iv) develop gene-based markers to support breeding programmes. A total of 58,429 putative unigenes were identified after assembly and filtering of the transcriptome. 32,152 of these unigenes showed homology with proteins identified in the genome from the most common plant model species. Well-known family proteins and transcription factors involved in abiotic, biotic or both stresses were identified after functional annotation. A total of 30,693 polymorphisms were identified in 7,125 isotigs, a large number of them corresponding to single nucleotide polymorphisms (SNPs; 27,359). In a subset randomly selected for validation, 87% of the SNPs were confirmed. The material generated may be valuable for future Ulmus gene expression, population genomics and association genetics studies, especially taking into account the scarce molecular information available for this genus and the great impact that DED has on elm populations. PMID:26257751

  7. From chromatogram to analyte to metabolite. How to pick horses for courses from the massive web resources for mass spectral plant metabolomics

    PubMed Central

    Perez de Souza, Leonardo; Naake, Thomas; Tohge, Takayuki; Fernie, Alisdair R

    2017-01-01

    Abstract The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome. PMID:28520864

  8. In-cell RNA structure probing with SHAPE-MaP.

    PubMed

    Smola, Matthew J; Weeks, Kevin M

    2018-06-01

    This protocol is an extension to: Nat. Protoc. 10, 1643-1669 (2015); doi:10.1038/nprot.2015.103; published online 01 October 2015RNAs play key roles in many cellular processes. The underlying structure of RNA is an important determinant of how transcripts function, are processed, and interact with RNA-binding proteins and ligands. RNA structure analysis by selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) takes advantage of the reactivity of small electrophilic chemical probes that react with the 2'-hydroxyl group to assess RNA structure at nucleotide resolution. When coupled with mutational profiling (MaP), in which modified nucleotides are detected as internal miscodings during reverse transcription and then read out by massively parallel sequencing, SHAPE yields quantitative per-nucleotide measurements of RNA structure. Here, we provide an extension to our previous in vitro SHAPE-MaP protocol with detailed guidance for undertaking and analyzing SHAPE-MaP probing experiments in live cells. The MaP strategy works for both abundant-transcriptome experiments and for cellular RNAs of low to moderate abundance, which are not well examined by whole-transcriptome methods. In-cell SHAPE-MaP, performed in roughly 3 d, can be applied in cell types ranging from bacteria to cultured mammalian cells and is compatible with a variety of structure-probing reagents. We detail several strategies by which in-cell SHAPE-MaP can inform new biological hypotheses and emphasize downstream analyses that reveal sequence or structure motifs important for RNA interactions in cells.

  9. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    NASA Astrophysics Data System (ADS)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  10. Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

    PubMed Central

    Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

    2015-01-01

    Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877

  11. The genesis of an exceptionally lethal venom in the timber rattlesnake (Crotalus horridus) revealed through comparative venom-gland transcriptomics

    PubMed Central

    2013-01-01

    Background Snake venoms generally show sequence and quantitative variation within and between species, but some rattlesnakes have undergone exceptionally rapid, dramatic shifts in the composition, lethality, and pharmacological effects of their venoms. Such shifts have occurred within species, most notably in Mojave (Crotalus scutulatus), South American (C. durissus), and timber (C. horridus) rattlesnakes, resulting in some populations with extremely potent, neurotoxic venoms without the hemorrhagic effects typical of rattlesnake bites. Results To better understand the evolutionary changes that resulted in the potent venom of a population of C. horridus from northern Florida, we sequenced the venom-gland transcriptome of an animal from this population for comparison with the previously described transcriptome of the eastern diamondback rattlesnake (C. adamanteus), a congener with a more typical rattlesnake venom. Relative to the toxin transcription of C. adamanteus, which consisted primarily of snake-venom metalloproteinases, C-type lectins, snake-venom serine proteinases, and myotoxin-A, the toxin transcription of C. horridus was far simpler in composition and consisted almost entirely of snake-venom serine proteinases, phospholipases A2, and bradykinin-potentiating and C-type natriuretic peptides. Crotalus horridus lacked significant expression of the hemorrhagic snake-venom metalloproteinases and C-type lectins. Evolution of shared toxin families involved differential expansion and loss of toxin clades within each species and pronounced differences in the highly expressed toxin paralogs. Toxin genes showed significantly higher rates of nonsynonymous substitution than nontoxin genes. The expression patterns of nontoxin genes were conserved between species, despite the vast differences in toxin expression. Conclusions Our results represent the first complete, sequence-based comparison between the venoms of closely related snake species and reveal in unprecedented detail the rapid evolution of snake venoms. We found that the difference in venom properties resulted from major changes in expression levels of toxin gene families, differential gene-family expansion and loss, changes in which paralogs within gene families were expressed at high levels, and higher nonsynonymous substitution rates in the toxin genes relative to nontoxins. These massive alterations in the genetics of the venom phenotype emphasize the evolutionary lability and flexibility of this ecologically critical trait. PMID:23758969

  12. Comparison of ribosomal RNA removal methods for transcriptome sequencing workflows in teleost fish

    USDA-ARS?s Scientific Manuscript database

    RNA sequencing (RNA-Seq) is becoming the standard for transcriptome analysis. Removal of contaminating ribosomal RNA (rRNA) is a priority in the preparation of libraries suitable for sequencing. rRNAs are commonly removed from total RNA via either mRNA selection or rRNA depletion. These methods have...

  13. Transcriptome sequencing of newly molted adult female cattle ticks, Rhipicephalus microplus: Raw Illumina reads.

    USDA-ARS?s Scientific Manuscript database

    Illumina paired end oligo-dT sequencing technology was used to sequence the transcriptome from newly molted adult females from the cattle tick, Rhipicephalus microplus. These samples include newly molted unfed whole adult females, newly molted whole adult females feeding for 2 hours on a bovine host...

  14. CBrowse: a SAM/BAM-based contig browser for transcriptome assembly visualization and analysis.

    PubMed

    Li, Pei; Ji, Guoli; Dong, Min; Schmidt, Emily; Lenox, Douglas; Chen, Liangliang; Liu, Qi; Liu, Lin; Zhang, Jie; Liang, Chun

    2012-09-15

    To address the impending need for exploring rapidly increased transcriptomics data generated for non-model organisms, we developed CBrowse, an AJAX-based web browser for visualizing and analyzing transcriptome assemblies and contigs. Designed in a standard three-tier architecture with a data pre-processing pipeline, CBrowse is essentially a Rich Internet Application that offers many seamlessly integrated web interfaces and allows users to navigate, sort, filter, search and visualize data smoothly. The pre-processing pipeline takes the contig sequence file in FASTA format and its relevant SAM/BAM file as the input; detects putative polymorphisms, simple sequence repeats and sequencing errors in contigs and generates image, JSON and database-compatible CSV text files that are directly utilized by different web interfaces. CBowse is a generic visualization and analysis tool that facilitates close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors in transcriptome sequencing projects. CBrowse is distributed under the GNU General Public License, available at http://bioinfolab.muohio.edu/CBrowse/ liangc@muohio.edu or liangc.mu@gmail.com; glji@xmu.edu.cn Supplementary data are available at Bioinformatics online.

  15. A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.

    PubMed

    Melicher, Dacotah; Torson, Alex S; Dworkin, Ian; Bowsher, Julia H

    2014-03-12

    The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba. Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species. The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

  16. Sequencing, Annotation and Analysis of the Syrian Hamster (Mesocricetus auratus) Transcriptome

    PubMed Central

    Tchitchek, Nicolas; Safronetz, David; Rasmussen, Angela L.; Martens, Craig; Virtaneva, Kimmo; Porcella, Stephen F.; Feldmann, Heinz

    2014-01-01

    Background The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species. Results A cDNA library derived from mRNA and snRNA isolated and pooled from the brains, lungs, spleens, kidneys, livers, and hearts of three adult female Syrian hamsters was sequenced. Sequence reads were assembled into 62,482 contigs and 111,796 reads remained unassembled (singletons). This combined contig/singleton dataset, designated as the Syrian hamster transcriptome, represents a total of 60,117,204 nucleotides. Our Mesocricetus auratus Syrian hamster transcriptome mapped to 11,648 mouse transcripts representing 9,562 distinct genes, and mapped to a similar number of transcripts and genes in the rat. We identified 214 quasi-complete transcripts based on mouse annotations. Canonical pathways involved in a broad spectrum of fundamental biological processes were significantly represented in the library. The Syrian hamster transcriptome was aligned to the current release of the Chinese hamster ovary (CHO) cell transcriptome and genome to improve the genomic annotation of this species. Finally, our Syrian hamster transcriptome was aligned against 14 other rodents, primate and laurasiatheria species to gain insights about the genetic relatedness and placement of this species. Conclusions This Syrian hamster transcriptome dataset significantly improves our knowledge of the Syrian hamster's transcriptome, especially towards its future use in infectious disease research. Moreover, this library is an important resource for the wider scientific community to help improve genome annotation of the Syrian hamster and other closely related species. Furthermore, these data provide the basis for development of expression microarrays that can be used in functional genomics studies. PMID:25398096

  17. New approach for the study of mite reproduction: The first transcriptome analysis of a mite, Phytoseiulus persimilis (Acari: Phytoseiidae).

    PubMed

    Cabrera, Ana R; Donohue, Kevin V; Khalil, Sayed M S; Scholl, Elizabeth; Opperman, Charles; Sonenshine, Daniel E; Roe, R Michael

    2011-01-01

    Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yield sequences of genes critical during physiological processes poorly understood in acarines, i.e., the regulation of female reproduction in mites. The predatory mite, Phytoseiulus persimilis, was selected to conduct a transcriptome analysis using 454 pyrosequencing. The objective of this project was to obtain DNA-sequence information of expressed genes from P. persimilis with special interest in sequences corresponding to vitellogenin (Vg) and the vitellogenin receptor (VgR). These genes are critical to the understanding of vitellogenesis, and they will facilitate the study of the regulation of mite female reproduction. A total of 12,556 contiguous sequences (contigs) were assembled with an average size of 935bp. From these sequences, the putative translated peptides of 11 contigs were similar in amino acid sequences to other arthropod Vgs, while 6 were similar to VgRs. We selected some of these sequences to conduct stage-specific expression studies to further determine their function. 2010 Elsevier Ltd. All rights reserved.

  18. Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

    PubMed

    Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

    2014-05-01

    We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.

  19. RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    PubMed

    Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-10-01

    Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.

  20. Transcriptome assembly and digital gene expression atlas of the rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Background: Transcriptome analysis is a preferred method for gene discovery, marker development and gene expression profiling in non-model organisms. Previously, we sequenced a transcriptome reference using Sanger-based and 454-pyrosequencing, however, a transcriptome assembly is still incomplete an...

  1. A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

    PubMed

    Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

    2014-09-11

    The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org.

  2. Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies

    PubMed Central

    2012-01-01

    Background Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Results Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin. Conclusion L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection. PMID:22920992

  3. Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies.

    PubMed

    Parra-González, Lorena B; Aravena-Abarzúa, Gabriela A; Navarro-Navarro, Cristell S; Udall, Joshua; Maughan, Jeff; Peterson, Louis M; Salvo-Garrido, Haroldo E; Maureira-Butler, Iván J

    2012-08-24

    Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession's origin. L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection.

  4. RNA-Seq Technology and Its Application in Fish Transcriptomics

    PubMed Central

    Ba, Yi; Zhuang, Qianfeng

    2014-01-01

    Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445

  5. Comparative expression profiling in grape (Vitis vinifera) berries derived from frequency analysis of ESTs and MPSS signatures.

    PubMed

    Iandolino, Alberto; Nobuta, Kan; da Silva, Francisco Goes; Cook, Douglas R; Meyers, Blake C

    2008-05-12

    Vitis vinifera (V. vinifera) is the primary grape species cultivated for wine production, with an industry valued annually in the billions of dollars worldwide. In order to sustain and increase grape production, it is necessary to understand the genetic makeup of grape species. Here we performed mRNA profiling using Massively Parallel Signature Sequencing (MPSS) and combined it with available Expressed Sequence Tag (EST) data. These tag-based technologies, which do not require a priori knowledge of genomic sequence, are well-suited for transcriptional profiling. The sequence depth of MPSS allowed us to capture and quantify almost all the transcripts at a specific stage in the development of the grape berry. The number and relative abundance of transcripts from stage II grape berries was defined using Massively Parallel Signature Sequencing (MPSS). A total of 2,635,293 17-base and 2,259,286 20-base signatures were obtained, representing at least 30,737 and 26,878 distinct sequences. The average normalized abundance per signature was approximately 49 TPM (Transcripts Per Million). Comparisons of the MPSS signatures with available Vitis species' ESTs and a unigene set demonstrated that 6,430 distinct contigs and 2,190 singletons have a perfect match to at least one MPSS signature. Among the matched sequences, ESTs were identified from tissues other than berries or from berries at different developmental stages. Additional MPSS signatures not matching to known grape ESTs can extend our knowledge of the V. vinifera transcriptome, particularly when these data are used to assist in annotation of whole genome sequences from Vitis vinifera. The MPSS data presented here not only achieved a higher level of saturation than previous EST based analyses, but in doing so, expand the known set of transcripts of grape berries during the unique stage in development that immediately precedes the onset of ripening. The MPSS dataset also revealed evidence of antisense expression not previously reported in grapes but comparable to that reported in other plant species. Finally, we developed a novel web-based, public resource for utilization of the grape MPSS data [1].

  6. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas

    PubMed Central

    Hou, Yu; Guo, Huahu; Cao, Chen; Li, Xianlong; Hu, Boqiang; Zhu, Ping; Wu, Xinglong; Wen, Lu; Tang, Fuchou; Huang, Yanyi; Peng, Jirun

    2016-01-01

    Single-cell genome, DNA methylome, and transcriptome sequencing methods have been separately developed. However, to accurately analyze the mechanism by which transcriptome, genome and DNA methylome regulate each other, these omic methods need to be performed in the same single cell. Here we demonstrate a single-cell triple omics sequencing technique, scTrio-seq, that can be used to simultaneously analyze the genomic copy-number variations (CNVs), DNA methylome, and transcriptome of an individual mammalian cell. We show that large-scale CNVs cause proportional changes in RNA expression of genes within the gained or lost genomic regions, whereas these CNVs generally do not affect DNA methylation in these regions. Furthermore, we applied scTrio-seq to 25 single cancer cells derived from a human hepatocellular carcinoma tissue sample. We identified two subpopulations within these cells based on CNVs, DNA methylome, or transcriptome of individual cells. Our work offers a new avenue of dissecting the complex contribution of genomic and epigenomic heterogeneities to the transcriptomic heterogeneity within a population of cells. PMID:26902283

  7. The testes transcriptome derived from the New World Screwworm, Cochliomyia hominivorax TSA

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain of the New World Screwworm, Cochliomyia hominivorax. This transcriptome contains 4,149 unigenes and the Transcriptome...

  8. Allele Identification for Transcriptome-Based Population Genomics in the Invasive Plant Centaurea solstitialis

    PubMed Central

    Dlugosch, Katrina M.; Lai, Zhao; Bonin, Aurélie; Hierro, José; Rieseberg, Loren H.

    2013-01-01

    Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 11−430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios. PMID:23390612

  9. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species.

    PubMed

    Wang, Xiao-Wei; Zhao, Qiong-Yi; Luan, Jun-Bo; Wang, Yu-Jun; Yan, Gen-Hong; Liu, Shu-Sheng

    2012-10-04

    Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences.

  10. Analysis of a native whitefly transcriptome and its sequence divergence with two invasive whitefly species

    PubMed Central

    2012-01-01

    Background Genomic divergence between invasive and native species may provide insight into the molecular basis underlying specific characteristics that drive the invasion and displacement of closely related species. In this study, we sequenced the transcriptome of an indigenous species, Asia II 3, of the Bemisia tabaci complex and compared its genetic divergence with the transcriptomes of two invasive whiteflies species, Middle East Asia Minor 1 (MEAM1) and Mediterranean (MED), respectively. Results More than 16 million reads of 74 base pairs in length were obtained for the Asia II 3 species using the Illumina sequencing platform. These reads were assembled into 52,535 distinct sequences (mean size: 466 bp) and 16,596 sequences were annotated with an E-value above 10-5. Protein family comparisons revealed obvious diversification among the transcriptomes of these species suggesting species-specific adaptations during whitefly evolution. On the contrary, substantial conservation of the whitefly transcriptomes was also evident, despite their differences. The overall divergence of coding sequences between the orthologous gene pairs of Asia II 3 and MEAM1 is 1.73%, which is comparable to the average divergence of Asia II 3 and MED transcriptomes (1.84%) and much higher than that of MEAM1 and MED (0.83%). This is consistent with the previous phylogenetic analyses and crossing experiments suggesting these are distinct species. We also identified hundreds of highly diverged genes and compiled sequence identify data into gene functional groups and found the most divergent gene classes are Cytochrome P450, Glutathione metabolism and Oxidative phosphorylation. These results strongly suggest that the divergence of genes related to metabolism might be the driving force of the MEAM1 and Asia II 3 differentiation. We also analyzed single nucleotide polymorphisms within the orthologous gene pairs of indigenous and invasive whiteflies which are helpful for the investigation of association between allelic and phenotypes. Conclusions Our data present the most comprehensive sequences for the indigenous whitefly species Asia II 3. The extensive comparisons of Asia II 3, MEAM1 and MED transcriptomes will serve as an invaluable resource for revealing the genetic basis of whitefly invasion and the molecular mechanisms underlying their biological differences. PMID:23036081

  11. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    USDA-ARS?s Scientific Manuscript database

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  12. Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation.

    PubMed

    Dueck, Hannah; Khaladkar, Mugdha; Kim, Tae Kyung; Spaethling, Jennifer M; Francis, Chantal; Suresh, Sangita; Fisher, Stephen A; Seale, Patrick; Beck, Sheryl G; Bartfai, Tamas; Kuhn, Bernhard; Eberwine, James; Kim, Junhyong

    2015-06-09

    Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question. We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved. Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.

  13. Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

    PubMed Central

    Cheng, Bing; Furtado, Agnelo

    2017-01-01

    Abstract Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. PMID:29048540

  14. Developmental Gene Discovery in a Hemimetabolous Insect: De Novo Assembly and Annotation of a Transcriptome for the Cricket Gryllus bimaculatus

    PubMed Central

    Zeng, Victor; Ewen-Campen, Ben; Horch, Hadley W.; Roth, Siegfried; Mito, Taro; Extavour, Cassandra G.

    2013-01-01

    Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects), representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket), a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts) and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr) identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in Gryllus. PMID:23671567

  15. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

    PubMed Central

    Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676

  16. Simultaneous transcriptome analysis of Colletotrichum gloeosporioides and tomato fruit pathosystem reveals novel fungal pathogenicity and fruit defense strategies.

    PubMed

    Alkan, Noam; Friedlander, Gilgi; Ment, Dana; Prusky, Dov; Fluhr, Robert

    2015-01-01

    The fungus Colletotrichum gloeosporioides breaches the fruit cuticle but remains quiescent until fruit ripening signals a switch to necrotrophy, culminating in devastating anthracnose disease. There is a need to understand the distinct fungal arms strategy and the simultaneous fruit response. Transcriptome analysis of fungal-fruit interactions was carried out concurrently in the appressoria, quiescent and necrotrophic stages. Conidia germinating on unripe fruit cuticle showed stage-specific transcription that was accompanied by massive fruit defense responses. The subsequent quiescent stage showed the development of dendritic-like structures and swollen hyphae within the fruit epidermis. The quiescent fungal transcriptome was characterized by activation of chromatin remodeling genes and unsuspected environmental alkalization. Fruit response was portrayed by continued highly integrated massive up-regulation of defense genes. During cuticle infection of green or ripe fruit, fungi recapitulate the same developmental stages but with differing quiescent time spans. The necrotrophic stage showed a dramatic shift in fungal metabolism and up-regulation of pathogenicity factors. Fruit response to necrotrophy showed activation of the salicylic acid pathway, climaxing in cell death. Transcriptome analysis of C. gloeosporioides infection of fruit reveals its distinct stage-specific lifestyle and the concurrent changing fruit response, deepening our perception of the unfolding fungal-fruit arms and defenses race. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.

  17. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing

    Treesearch

    Zhuang Hu; Tian Zhang; Xiao-Xiao Gao; Yang Wang; Qiang Zhang; Hui-Juan Zhou; Gui-Fang Zhao; Ma-Li Wang; Keith E. Woeste; Peng Zhao

    2016-01-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome...

  18. De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    PubMed Central

    2011-01-01

    Background Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects. Results Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences. Conclusions 454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated. PMID:21232141

  19. De Novo Sequencing and Characterization of the Floral Transcriptome of Dendrocalamus latiflorus (Poaceae: Bambusoideae)

    PubMed Central

    Li, De-Zhu; Guo, Zhen-Hua

    2012-01-01

    Background Transcriptome sequencing can be used to determine gene sequences and transcript abundance in non-model species, and the advent of next-generation sequencing (NGS) technologies has greatly decreased the cost and time required for this process. Transcriptome data are especially desirable in bamboo species, as certain members constitute an economically and culturally important group of mostly semelparous plants with remarkable flowering features, yet little bamboo genomic research has been performed. Here we present, for the first time, extensive sequence and transcript abundance data for the floral transcriptome of a key bamboo species, Dendrocalamus latiflorus, obtained using the Illumina GAII sequencing platform. Our further goal was to identify patterns of gene expression during bamboo flower development. Results Approximately 96 million sequencing reads were generated and assembled de novo, yielding 146,395 high quality unigenes with an average length of 461 bp. Of these, 80,418 were identified as putative homologs of annotated sequences in the public protein databases, of which 290 were associated with the floral transition and 47 were related to flower development. Digital abundance analysis identified 26,529 transcripts differentially enriched between two developmental stages, young flower buds and older developing flowers. Unigenes found at each stage were categorized according to their putative functional categories. These sequence and putative function data comprise a resource for future investigation of the floral transition and flower development in bamboo species. Conclusions Our results present the first broad survey of a bamboo floral transcriptome. Although it will be necessary to validate the functions carried out by these genes, these results represent a starting point for future functional research on D. latiflorus and related species. PMID:22916120

  20. Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

    PubMed

    Ma, Jun; Kanakala, S; He, Yehua; Zhang, Junli; Zhong, Xiaolan

    2015-01-01

    Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.

  1. Transcriptome Sequence Analysis of an Ornamental Plant, Ananas comosus var. bracteatus, Revealed the Potential Unigenes Involved in Terpenoid and Phenylpropanoid Biosynthesis

    PubMed Central

    Ma, Jun; Kanakala, S.; He, Yehua; Zhang, Junli; Zhong, Xiaolan

    2015-01-01

    Background Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. Results The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. Conclusion The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus. PMID:25769053

  2. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  3. Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

    PubMed

    Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

    2005-01-01

    We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.

  4. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

    PubMed

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-07-09

    The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with useful annotation information with easy-to-use web interfaces, which helps researchers to efficiently search for target sequences such as insect resistance-related genes. KONAGAbase will be continuously updated and additional genomic/transcriptomic resources and analysis tools will be provided for further efficient analysis of the mechanism of insecticide resistance and the development of effective insecticides with a novel mode of action for DBM.

  5. Enabling large-scale next-generation sequence assembly with Blacklight

    PubMed Central

    Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.

    2014-01-01

    Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974

  6. Whole Transcriptome Sequencing Enables Discovery and Analysis of Viruses in Archived Primary Central Nervous System Lymphomas

    PubMed Central

    DeBoever, Christopher; Reid, Erin G.; Smith, Erin N.; Wang, Xiaoyun; Dumaop, Wilmar; Harismendy, Olivier; Carson, Dennis; Richman, Douglas; Masliah, Eliezer; Frazer, Kelly A.

    2013-01-01

    Primary central nervous system lymphomas (PCNSL) have a dramatically increased prevalence among persons living with AIDS and are known to be associated with human Epstein Barr virus (EBV) infection. Previous work suggests that in some cases, co-infection with other viruses may be important for PCNSL pathogenesis. Viral transcription in tumor samples can be measured using next generation transcriptome sequencing. We demonstrate the ability of transcriptome sequencing to identify viruses, characterize viral expression, and identify viral variants by sequencing four archived AIDS-related PCNSL tissue samples and analyzing raw sequencing reads. EBV was detected in all four PCNSL samples and cytomegalovirus (CMV), JC polyomavirus (JCV), and HIV were also discovered, consistent with clinical diagnoses. CMV was found to express three long non-coding RNAs recently reported as expressed during active infection. Single nucleotide variants were observed in each of the viruses observed and three indels were found in CMV. No viruses were found in several control tumor types including 32 diffuse large B-cell lymphoma samples. This study demonstrates the ability of next generation transcriptome sequencing to accurately identify viruses, including DNA viruses, in solid human cancer tissue samples. PMID:24023918

  7. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computationalmore » predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.« less

  8. Optimized approach for Ion Proton RNA sequencing reveals details of RNA splicing and editing features of the transcriptome.

    PubMed

    Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A

    2017-01-01

    RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.

  9. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    PubMed

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.

  10. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

    PubMed Central

    2013-01-01

    Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163

  11. Resolving Relationships among the Megadiverse Butterflies and Moths with a Novel Pipeline for Anchored Phylogenomics.

    PubMed

    Breinholt, Jesse W; Earl, Chandra; Lemmon, Alan R; Lemmon, Emily Moriarty; Xiao, Lei; Kawahara, Akito Y

    2018-01-01

    The advent of next-generation sequencing technology has allowed for thecollection of large portions of the genome for phylogenetic analysis. Hybrid enrichment and transcriptomics are two techniques that leverage next-generation sequencing and have shown much promise. However, methods for processing hybrid enrichment data are still limited. We developed a pipeline for anchored hybrid enrichment (AHE) read assembly, orthology determination, contamination screening, and data processing for sequences flanking the target "probe" region. We apply this approach to study the phylogeny of butterflies and moths (Lepidoptera), a megadiverse group of more than 157,000 described species with poorly understood deep-level phylogenetic relationships. We introduce a new, 855 locus AHE kit for Lepidoptera phylogenetics and compare resulting trees to those from transcriptomes. The enrichment kit was designed from existing genomes, transcriptomes, and expressed sequence tags and was used to capture sequence data from 54 species from 23 lepidopteran families. Phylogenies estimated from AHE data were largely congruent with trees generated from transcriptomes, with strong support for relationships at all but the deepest taxonomic levels. We combine AHE and transcriptomic data to generate a new Lepidoptera phylogeny, representing 76 exemplar species in 42 families. The tree provides robust support for many relationships, including those among the seven butterfly families. The addition of AHE data to an existing transcriptomic dataset lowers node support along the Lepidoptera backbone, but firmly places taxa with AHE data on the phylogeny. Combining taxa sequenced for AHE with existing transcriptomes and genomes resulted in a tree with strong support for (Calliduloidea $+$ Gelechioidea $+$ Thyridoidea) $+$ (Papilionoidea $+$ Pyraloidea $+$ Macroheterocera). To examine the efficacy of AHE at a shallow taxonomic level, phylogenetic analyses were also conducted on a sister group representing a more recent divergence, the Saturniidae and Sphingidae. These analyses utilized sequences from the probe region and data flanking it, nearly doubled the size of the dataset; resulting trees supported new phylogenetics relationships, especially within the Saturniidae and Sphingidae (e.g., Hemarina derived in the latter). We hope that our data processing pipeline, hybrid enrichment gene set, and approach of combining AHE data with transcriptomes will be useful for the broader systematics community. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Short communication: development and characterization of novel transcriptome-derived microsatellites for genetic analysis of persimmon.

    PubMed

    Luo, C; Zhang, Q L; Luo, Z R

    2014-04-16

    Oriental persimmon (Diospyros kaki Thunb.) (2n = 6x = 90) is a major commercial and deciduous fruit tree that is believed to have originated in China. However, rare transcriptomic and genomic information on persimmon is available. Using Roche 454 sequencing technology, the transcriptome from RNA of the flowers of D. kaki was analyzed. A total of 1,250,893 reads were generated and 83,898 unigenes were assembled. A total of 42,711 SSR loci were identified from 23,494 unigenes and 289 polymerase chain reaction primer pairs were designed. Of these 289 primers, 155 (53.6%) showed robust PCR amplification and 98 revealed polymorphism between 15 persimmon genotypes, indicating a polymorphic rate of 63.23% of the productive primers for characterization and genotyping of the genus Diospyros. Transcriptome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no genomic sequence information available.

  13. Digital Marine Bioprospecting: Mining New Neurotoxin Drug Candidates from the Transcriptomes of Cold-Water Sea Anemones

    PubMed Central

    Urbarova, Ilona; Karlsen, Bård Ove; Okkenhaug, Siri; Seternes, Ole Morten; Johansen, Steinar D.; Emblem, Åse

    2012-01-01

    Marine bioprospecting is the search for new marine bioactive compounds and large-scale screening in extracts represents the traditional approach. Here, we report an alternative complementary protocol, called digital marine bioprospecting, based on deep sequencing of transcriptomes. We sequenced the transcriptomes from the adult polyp stage of two cold-water sea anemones, Bolocera tuediae and Hormathia digitata. We generated approximately 1.1 million quality-filtered sequencing reads by 454 pyrosequencing, which were assembled into approximately 120,000 contigs and 220,000 single reads. Based on annotation and gene ontology analysis we profiled the expressed mRNA transcripts according to known biological processes. As a proof-of-concept we identified polypeptide toxins with a potential blocking activity on sodium and potassium voltage-gated channels from digital transcriptome libraries. PMID:23170083

  14. Transcriptome Sequencing and Developmental Regulation of Gene Expression in Anopheles aquasalis

    PubMed Central

    Silva, Maria C. P.; Lopes, Adriana R.; Barros, Michele S.; Sá-Nunes, Anderson; Kojin, Bianca B.; Carvalho, Eneas; Suesdek, Lincoln; Silva-Neto, Mário Alberto C.; James, Anthony A.; Capurro, Margareth L.

    2014-01-01

    Background Anopheles aquasalis is a major malaria vector in coastal areas of South and Central America where it breeds preferentially in brackish water. This species is very susceptible to Plasmodium vivax and it has been already incriminated as responsible vector in malaria outbreaks. There has been no high-throughput investigation into the sequencing of An. aquasalis genes, transcripts and proteins despite its epidemiological relevance. Here we describe the sequencing, assembly and annotation of the An. aquasalis transcriptome. Methodology/Principal Findings A total of 419 thousand cDNA sequence reads, encompassing 164 million nucleotides, were assembled in 7544 contigs of ≥2 sequences, and 1999 singletons. The majority of the An. aquasalis transcripts encode proteins with their closest counterparts in another neotropical malaria vector, An. darlingi. Several analyses in different protein databases were used to annotate and predict the putative functions of the deduced An. aquasalis proteins. Larval and adult-specific transcripts were represented by 121 and 424 contig sequences, respectively. Fifty-one transcripts were only detected in blood-fed females. The data also reveal a list of transcripts up- or down-regulated in adult females after a blood meal. Transcripts associated with immunity, signaling networks and blood feeding and digestion are discussed. Conclusions/Significance This study represents the first large-scale effort to sequence the transcriptome of An. aquasalis. It provides valuable information that will facilitate studies on the biology of this species and may lead to novel strategies to reduce malaria transmission on the South American continent. The An. aquasalis transcriptome is accessible at http://exon.niaid.nih.gov/transcriptome/An_aquasalis/Anaquexcel.xlsx. PMID:25033462

  15. Microfluidic single-cell whole-transcriptome sequencing.

    PubMed

    Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi

    2014-05-13

    Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.

  16. Characterization and analysis of a transcriptome from the boreal spider crab Hyas araneus.

    PubMed

    Harms, Lars; Frickenhaus, Stephan; Schiffer, Melanie; Mark, Felix C; Storch, Daniela; Pörtner, Hans-Otto; Held, Christoph; Lucassen, Magnus

    2013-12-01

    Research investigating the genetic basis of physiological responses has significantly broadened our understanding of the mechanisms underlying organismic response to environmental change. However, genomic data are currently available for few taxa only, thus excluding physiological model species from this approach. In this study we report the transcriptome of the model organism Hyas araneus from Spitsbergen (Arctic). We generated 20,479 transcripts, using the 454 GS FLX sequencing technology in combination with an Illumina HiSeq sequencing approach. Annotation by Blastx revealed 7159 blast hits in the NCBI non-redundant protein database. The comparison between the spider crab H. araneus transcriptome and EST libraries of the European lobster Homarus americanus and the porcelain crab Petrolisthes cinctipes yielded 3229/2581 sequences with a significant hit, respectively. The clustering by the Markov Clustering Algorithm (MCL) revealed a common core of 1710 clusters present in all three species and 5903 unique clusters for H. araneus. The combined sequencing approaches generated transcripts that will greatly expand the limited genomic data available for crustaceans. We introduce the MCL clustering for transcriptome comparisons as a simple approach to estimate similarities between transcriptomic libraries of different size and quality and to analyze homologies within the selected group of species. In particular, we identified a large variety of reverse transcriptase (RT) sequences not only in the H. araneus transcriptome and other decapod crustaceans, but also sea urchin, supporting the hypothesis of a heritable, anti-viral immunity and the proposed viral fragment integration by host-derived RTs in marine invertebrates. © 2013.

  17. Transcriptome analysis by strand-specific sequencing of complementary DNA

    PubMed Central

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-01-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212

  18. Transcriptome analysis by strand-specific sequencing of complementary DNA.

    PubMed

    Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey

    2009-10-01

    High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.

  19. Assessing the Gene Content of the Megagenome: Sugar Pine (Pinus lambertiana)

    PubMed Central

    Gonzalez-Ibeas, Daniel; Martinez-Garcia, Pedro J.; Famula, Randi A.; Delfino-Mix, Annette; Stevens, Kristian A.; Loopstra, Carol A.; Langley, Charles H.; Neale, David B.; Wegrzyn, Jill L.

    2016-01-01

    Sugar pine (Pinus lambertiana Douglas) is within the subgenus Strobus with an estimated genome size of 31 Gbp. Transcriptomic resources are of particular interest in conifers due to the challenges presented in their megagenomes for gene identification. In this study, we present the first comprehensive survey of the P. lambertiana transcriptome through deep sequencing of a variety of tissue types to generate more than 2.5 billion short reads. Third generation, long reads generated through PacBio Iso-Seq have been included for the first time in conifers to combat the challenges associated with de novo transcriptome assembly. A technology comparison is provided here to contribute to the otherwise scarce comparisons of second and third generation transcriptome sequencing approaches in plant species. In addition, the transcriptome reference was essential for gene model identification and quality assessment in the parallel project responsible for sequencing and assembly of the entire genome. In this study, the transcriptomic data were also used to address questions surrounding lineage-specific Dicer-like proteins in conifers. These proteins play a role in the control of transposable element proliferation and the related genome expansion in conifers. PMID:27799338

  20. RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome

    USDA-ARS?s Scientific Manuscript database

    A first analysis of the Glycine max (L.) Merr. (soybean) transcriptome using next generation sequencing technology and RNA-Sequencing (RNA-Seq) is presented. This analysis will provide an important resource for understanding transcription and gene co-regulatory networks in soybean, the most economic...

  1. Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

    PubMed Central

    2011-01-01

    Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684

  2. Advances in single-cell RNA sequencing and its applications in cancer research.

    PubMed

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-08-08

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years' development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5.

  3. Advances in single-cell RNA sequencing and its applications in cancer research

    PubMed Central

    Zhu, Sibo; Qing, Tao; Zheng, Yuanting; Jin, Li; Shi, Leming

    2017-01-01

    Unlike population-level approaches, single-cell RNA sequencing enables transcriptomic analysis of an individual cell. Through the combination of high-throughput sequencing and bioinformatic tools, single-cell RNA-seq can detect more than 10,000 transcripts in one cell to distinguish cell subsets and dynamic cellular changes. After several years’ development, single-cell RNA-seq can now achieve massively parallel, full-length mRNA sequencing as well as in situ sequencing and even has potential for multi-omic detection. One appealing area of single-cell RNA-seq is cancer research, and it is regarded as a promising way to enhance prognosis and provide more precise target therapy by identifying druggable subclones. Indeed, progresses have been made regarding solid tumor analysis to reveal intratumoral heterogeneity, correlations between signaling pathways, stemness, drug resistance, and tumor architecture shaping the microenvironment. Furthermore, through investigation into circulating tumor cells, many genes have been shown to promote a propensity toward stemness and the epithelial-mesenchymal transition, to enhance anchoring and adhesion, and to be involved in mechanisms of anoikis resistance and drug resistance. This review focuses on advances and progresses of single-cell RNA-seq with regard to the following aspects: 1. Methodologies of single-cell RNA-seq 2. Single-cell isolation techniques 3. Single-cell RNA-seq in solid tumor research 4. Single-cell RNA-seq in circulating tumor cell research 5. Perspectives PMID:28881849

  4. From chromatogram to analyte to metabolite. How to pick horses for courses from the massive web resources for mass spectral plant metabolomics.

    PubMed

    Perez de Souza, Leonardo; Naake, Thomas; Tohge, Takayuki; Fernie, Alisdair R

    2017-07-01

    The grand challenge currently facing metabolomics is the expansion of the coverage of the metabolome from a minor percentage of the metabolic complement of the cell toward the level of coverage afforded by other post-genomic technologies such as transcriptomics and proteomics. In plants, this problem is exacerbated by the sheer diversity of chemicals that constitute the metabolome, with the number of metabolites in the plant kingdom generally considered to be in excess of 200 000. In this review, we focus on web resources that can be exploited in order to improve analyte and ultimately metabolite identification and quantification. There is a wide range of available software that not only aids in this but also in the related area of peak alignment; however, for the uninitiated, choosing which program to use is a daunting task. For this reason, we provide an overview of the pros and cons of the software as well as comments regarding the level of programing skills required to effectively exploit their basic functions. In addition, the torrent of available genome and transcriptome sequences that followed the advent of next-generation sequencing has opened up further valuable resources for metabolite identification. All things considered, we posit that only via a continued communal sharing of information such as that deposited in the databases described within the article are we likely to be able to make significant headway toward improving our coverage of the plant metabolome. © The Authors 2017. Published by Oxford University Press.

  5. High-throughput illumina strand-specific RNA sequencing library preparation

    USDA-ARS?s Scientific Manuscript database

    Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...

  6. Developing Single Nucleotide Polymorphism (SNP) markers from transcriptome sequences for the identification of longan (Dimocarpus longan) germplasm

    USDA-ARS?s Scientific Manuscript database

    Longan (Dimocarpus longan Lour.) is an important tropical fruit tree crop. Accurate varietal identification is essential for germplasm management and breeding. Using longan transcriptome sequences from public databases, we developed single nucleotide polymorphism (SNP) markers; validated 60 SNPs in...

  7. The testes transcriptome derived from the New World Screwworm, Cochliomyia hominivorax SRA

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain J06 of the New World Screwworm, Cochliomyia hominivorax. This sequencing project produced 72,750,822 raw reads and th...

  8. Comparative transcriptome analysis in Sclerotinia sclerotiorum and S. trifoliorum by 454 Titanium RNA sequencing

    USDA-ARS?s Scientific Manuscript database

    Sclerotinia sclerotiorum and S. trifoliorum are two closely related devastating plant pathogens. Extensive research has been conducted on S. sclerotiorum and its genome sequences are available. To take advantages of the genomic information of S. sclerotiorum, we compared the transcriptome of S. tr...

  9. Additional annotation of the pig transcriptome using integrated Iso-seq and Illumina RNA-seq analysis

    USDA-ARS?s Scientific Manuscript database

    Alternative splicing is a well-known phenomenon that dramatically increases eukaryotic transcriptome diversity. The extent of mRNA isoform diversity among porcine tissues was assessed using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) and Illumina short read sequencing ...

  10. Developmental Transcriptome Analysis and Identification of Genes Involved in Larval Metamorphosis of the Razor Clam, Sinonovacula constricta.

    PubMed

    Niu, Donghong; Wang, Fei; Xie, Shumei; Sun, Fanyue; Wang, Ze; Peng, Maoxiao; Li, Jiale

    2016-04-01

    The razor clam Sinonovacula constricta is an important commercial species. The deficiency of developmental transcriptomic data is becoming the bottleneck of further researches on the mechanisms underlying settlement and metamorphosis in early development. In this study, de novo transcriptome sequencing was performed for S. constricta at different early developmental stages by using Illumina HiSeq 2000 paired-end (PE) sequencing technology. A total of 112,209,077 PE clean reads were generated. De novo assembly generated 249,795 contigs with an average length of 585 bp. Gene annotation resulted in the identification of 22,870 unigene hits against the NCBI database. Eight unique sequences related to metamorphosis were identified and analyzed using real-time PCR. The razor clam reference transcriptome would provide useful information on early developmental and metamorphosis mechanisms and could be used in the genetic breeding of shellfish.

  11. Transcriptome Analysis at the Single-Cell Level Using SMART Technology.

    PubMed

    Fish, Rachel N; Bostick, Magnolia; Lehman, Alisa; Farmer, Andrew

    2016-10-10

    RNA sequencing (RNA-seq) is a powerful method for analyzing cell state, with minimal bias, and has broad applications within the biological sciences. However, transcriptome analysis of seemingly homogenous cell populations may in fact overlook significant heterogeneity that can be uncovered at the single-cell level. The ultra-low amount of RNA contained in a single cell requires extraordinarily sensitive and reproducible transcriptome analysis methods. As next-generation sequencing (NGS) technologies mature, transcriptome profiling by RNA-seq is increasingly being used to decipher the molecular signature of individual cells. This unit describes an ultra-sensitive and reproducible protocol to generate cDNA and sequencing libraries directly from single cells or RNA inputs ranging from 10 pg to 10 ng. Important considerations for working with minute RNA inputs are given. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  12. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.

    PubMed

    Kang, Hyun Min; Subramaniam, Meena; Targ, Sasha; Nguyen, Michelle; Maliskova, Lenka; McCarthy, Elizabeth; Wan, Eunice; Wong, Simon; Byrnes, Lauren; Lanata, Cristina M; Gate, Rachel E; Mostafavi, Sara; Marson, Alexander; Zaitlen, Noah; Criswell, Lindsey A; Ye, Chun Jimmie

    2018-01-01

    Droplet single-cell RNA-sequencing (dscRNA-seq) has enabled rapid, massively parallel profiling of transcriptomes. However, assessing differential expression across multiple individuals has been hampered by inefficient sample processing and technical batch effects. Here we describe a computational tool, demuxlet, that harnesses natural genetic variation to determine the sample identity of each droplet containing a single cell (singlet) and detect droplets containing two cells (doublets). These capabilities enable multiplexed dscRNA-seq experiments in which cells from unrelated individuals are pooled and captured at higher throughput than in standard workflows. Using simulated data, we show that 50 single-nucleotide polymorphisms (SNPs) per cell are sufficient to assign 97% of singlets and identify 92% of doublets in pools of up to 64 individuals. Given genotyping data for each of eight pooled samples, demuxlet correctly recovers the sample identity of >99% of singlets and identifies doublets at rates consistent with previous estimates. We apply demuxlet to assess cell-type-specific changes in gene expression in 8 pooled lupus patient samples treated with interferon (IFN)-β and perform eQTL analysis on 23 pooled samples.

  13. Single-cell barcoding and sequencing using droplet microfluidics.

    PubMed

    Zilionis, Rapolas; Nainys, Juozas; Veres, Adrian; Savova, Virginia; Zemmour, David; Klein, Allon M; Mazutis, Linas

    2017-01-01

    Single-cell RNA sequencing has recently emerged as a powerful tool for mapping cellular heterogeneity in diseased and healthy tissues, yet high-throughput methods are needed for capturing the unbiased diversity of cells. Droplet microfluidics is among the most promising candidates for capturing and processing thousands of individual cells for whole-transcriptome or genomic analysis in a massively parallel manner with minimal reagent use. We recently established a method called inDrops, which has the capability to index >15,000 cells in an hour. A suspension of cells is first encapsulated into nanoliter droplets with hydrogel beads (HBs) bearing barcoding DNA primers. Cells are then lysed and mRNA is barcoded (indexed) by a reverse transcription (RT) reaction. Here we provide details for (i) establishing an inDrops platform (1 d); (ii) performing hydrogel bead synthesis (4 d); (iii) encapsulating and barcoding cells (1 d); and (iv) RNA-seq library preparation (2 d). inDrops is a robust and scalable platform, and it is unique in its ability to capture and profile >75% of cells in even very small samples, on a scale of thousands or tens of thousands of cells.

  14. The first whole transcriptomic exploration of pre-oviposited early chicken embryos using single and bulked embryonic RNA-sequencing.

    PubMed

    Hwang, Young Sun; Seo, Minseok; Choi, Hee Jung; Kim, Sang Kyung; Kim, Heebal; Han, Jae Yong

    2018-04-01

    The chicken is a valuable model organism, especially in evolutionary and embryology research because its embryonic development occurs in the egg. However, despite its scientific importance, no transcriptome data have been generated for deciphering the early developmental stages of the chicken because of practical and technical constraints in accessing pre-oviposited embryos. Here, we determine the entire transcriptome of pre-oviposited avian embryos, including oocyte, zygote, and intrauterine embryos from Eyal-giladi and Kochav stage I (EGK.I) to EGK.X collected using a noninvasive approach for the first time. We also compare RNA-sequencing data obtained using a bulked embryo sequencing and single embryo/cell sequencing technique. The raw sequencing data were preprocessed with two genome builds, Galgal4 and Galgal5, and the expression of 17,108 and 26,102 genes was quantified in the respective builds. There were some differences between the two techniques, as well as between the two genome builds, and these were affected by the emergence of long intergenic noncoding RNA annotations. The first transcriptome datasets of pre-oviposited early chicken embryos based on bulked and single embryo sequencing techniques will serve as a valuable resource for investigating early avian embryogenesis, for comparative studies among vertebrates, and for novel gene annotation in the chicken genome.

  15. Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis.

    PubMed

    Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu

    2017-01-01

    Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant.

  16. Transcriptome analysis of nitric oxide-responsive genes in upland cotton (Gossypium hirsutum).

    PubMed

    Huang, Juan; Wei, Hengling; Li, Libei; Yu, Shuxun

    2018-01-01

    Nitric oxide (NO) is an important signaling molecule with diverse physiological functions in plants. It is therefore important to characterize the downstream genes and signal transduction networks modulated by NO. Here, we identified 1,932 differentially expressed genes (DEGs) responding to NO in upland cotton using high throughput tag sequencing. The results of quantitative real-time polymerase chain reaction (qRT-PCR) analysis of 25 DEGs showed good consistency. Gene Ontology (GO) and KEGG pathway were analyzed to gain a better understanding of these DEGs. We identified 157 DEGs belonging to 36 transcription factor (TF) families and 72 DEGs related to eight plant hormones, among which several TF families and hormones were involved in stress responses. Hydrogen peroxide and malondialdehyde (MDA) contents were increased, as well related genes after treatment with sodium nitroprusside (SNP) (an NO donor), suggesting a role for NO in the plant stress response. Finally, we compared of the current and previous data indicating a massive number of NO-responsive genes at the large-scale transcriptome level. This study evaluated the landscape of NO-responsive genes in cotton and identified the involvement of NO in the stress response. Some of the identified DEGs represent good candidates for further functional analysis in cotton.

  17. The transcriptome recipe for the venom cocktail of Tityus bahiensis scorpion.

    PubMed

    de Oliveira, Ursula Castro; Candido, Denise Maria; Dorce, Valquíria Abrão Coronado; Junqueira-de-Azevedo, Inácio de Loiola Meirelles

    2015-03-01

    Scorpion venom is a mixture of peptides, including antimicrobial, bradykinin-potentiating and anionic peptides and small to medium proteins, such as ion channel toxins, metalloproteinases and phospholipases that together cause severe clinical manifestation. Tityus bahiensis is the second most medically important scorpion species in Brazil and it is widely distributed in the country with the exception of the North Region. Here we sequenced and analyzed the transcripts from the venom glands of T. bahiensis, aiming at identifying and annotating venom gland expressed genes. A total of 116,027 long reads were generated by pyrosequencing and assembled in 2891 isotigs. An annotation process identified transcripts by similarity to known toxins, revealing that putative venom components represent 7.4% of gene expression. The major toxins identified are potassium and sodium channel toxins, whereas metalloproteinases showed an unexpected high abundance. Phylogenetic analysis of deduced metalloproteinases from T. bahiensis and other scorpions revealed a pattern of ancient and intraspecific gene expansions. Other venom molecules identified include antimicrobial, anionic and bradykinin-potentiating peptides, besides several putative new venom components. This report provides the first attempt to massively identify the venom components of this species and constitutes one of the few transcriptomic efforts on the genus Tityus. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. De novo characterization of fall dormant and nondormant alfalfa (Medicago sativa L.) leaf transcriptome and identification of candidate genes related to fall dormancy.

    PubMed

    Zhang, Senhao; Shi, Yinghua; Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

    2015-01-01

    Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa.

  19. De Novo Characterization of Fall Dormant and Nondormant Alfalfa (Medicago sativa L.) Leaf Transcriptome and Identification of Candidate Genes Related to Fall Dormancy

    PubMed Central

    Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

    2015-01-01

    Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa. PMID:25799491

  20. A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica

    PubMed Central

    2012-01-01

    Background The Azadirachta indica (neem) tree is a source of a wide number of natural products, including the potent biopesticide azadirachtin. In spite of its widespread applications in agriculture and medicine, the molecular aspects of the biosynthesis of neem terpenoids remain largely unexplored. The current report describes the draft genome and four transcriptomes of A. indica and attempts to contextualise the sequence information in terms of its molecular phylogeny, transcript expression and terpenoid biosynthesis pathways. A. indica is the first member of the family Meliaceae to be sequenced using next generation sequencing approach. Results The genome and transcriptomes of A. indica were sequenced using multiple sequencing platforms and libraries. The A. indica genome is AT-rich, bears few repetitive DNA elements and comprises about 20,000 genes. The molecular phylogenetic analyses grouped A. indica together with Citrus sinensis from the Rutaceae family validating its conventional taxonomic classification. Comparative transcript expression analysis showed either exclusive or enhanced expression of known genes involved in neem terpenoid biosynthesis pathways compared to other sequenced angiosperms. Genome and transcriptome analyses in A. indica led to the identification of repeat elements, nucleotide composition and expression profiles of genes in various organs. Conclusions This study on A. indica genome and transcriptomes will provide a model for characterization of metabolic pathways involved in synthesis of bioactive compounds, comparative evolutionary studies among various Meliaceae family members and help annotate their genomes. A better understanding of molecular pathways involved in the azadirachtin synthesis in A. indica will pave ways for bulk production of environment friendly biopesticides. PMID:22958331

  1. De novo Transcriptome Assembly of Common Wild Rice (Oryza rufipogon Griff.) and Discovery of Drought-Response Genes in Root Tissue Based on Transcriptomic Data.

    PubMed

    Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu

    2015-01-01

    The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.

  2. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    PubMed Central

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  3. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    PubMed

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  4. First Insights into the Subterranean Crustacean Bathynellacea Transcriptome: Transcriptionally Reduced Opsin Repertoire and Evidence of Conserved Homeostasis Regulatory Mechanisms

    PubMed Central

    Kim, Bo-Mi; Kang, Seunghyun; Ahn, Do-Hwan; Kim, Jin-Hyoung; Ahn, Inhye; Lee, Chi-Woo; Cho, Joo-Lae; Min, Gi-Sik; Park, Hyun

    2017-01-01

    Bathynellacea (Crustacea, Syncarida, Parabathynellidae) are subterranean aquatic crustaceans that typically inhabit freshwater interstitial spaces (e.g., groundwater) and are occasionally found in caves and even hot springs. In this study, we sequenced the whole transcriptome of Allobathynella bangokensis using RNA-seq. De novo sequence assembly produced 74,866 contigs including 28,934 BLAST hits. Overall, the gene sequences were most similar to those of the waterflea Daphnia pulex. In the A. bangokensis transcriptome, no opsin or related sequences were identified, and no contig aligned to the crustacean visual opsins and non-visual opsins (i.e. arthropsins, peropsins, and melaopsins), suggesting potential regressive adaptation to the dark environment. However, A. bangokensis expressed conserved gene family sets, such as heat shock proteins and those related to key innate immunity pathways and antioxidant defense systems, at the transcriptional level, suggesting that this species has evolved adaptations involving molecular mechanisms of homeostasis. The transcriptomic information of A. bangokensis will be useful for investigating molecular adaptations and response mechanisms to subterranean environmental conditions. PMID:28107438

  5. Analysis of the Citrullus colocynthis Transcriptome during Water Deficit Stress

    PubMed Central

    Wang, Zhuoyu; Hu, Hongtao; Goertzen, Leslie R.; McElroy, J. Scott; Dane, Fenny

    2014-01-01

    Citrullus colocynthis is a very drought tolerant species, closely related to watermelon (C. lanatus var. lanatus), an economically important cucurbit crop. Drought is a threat to plant growth and development, and the discovery of drought inducible genes with various functions is of great importance. We used high throughput mRNA Illumina sequencing technology and bioinformatic strategies to analyze the C. colocynthis leaf transcriptome under drought treatment. Leaf samples at four different time points (0, 24, 36, or 48 hours of withholding water) were used for RNA extraction and Illumina sequencing. qRT-PCR of several drought responsive genes was performed to confirm the accuracy of RNA sequencing. Leaf transcriptome analysis provided the first glimpse of the drought responsive transcriptome of this unique cucurbit species. A total of 5038 full-length cDNAs were detected, with 2545 genes showing significant changes during drought stress. Principle component analysis indicated that drought was the major contributing factor regulating transcriptome changes. Up regulation of many transcription factors, stress signaling factors, detoxification genes, and genes involved in phytohormone signaling and citrulline metabolism occurred under the water deficit conditions. The C. colocynthis transcriptome data highlight the activation of a large set of drought related genes in this species, thus providing a valuable resource for future functional analysis of candidate genes in defense of drought stress. PMID:25118696

  6. RNA sequencing, de novo assembly and differential analysis of the gill transcriptome of freshwater climbing perch Anabas testudineus after six days of seawater exposure.

    PubMed

    Chen, X L; Lui, E Y; Ip, Y Kwong; Lam, S H

    2018-06-21

    To obtain transcriptomic insights into branchial responses to salinity challenge in Anabas testudineus, this study employed RNA sequencing (RNA-Seq) to analyse the gill transcriptome of A. testudineus exposed to seawater (SW) for 6 days compared with the freshwater (FW) control group. A combined FW and SW gill transcriptome was de novo assembled from 169.9 million 101 bp paired-end reads. In silico validation employing 17 A. testudineus Sanger full-length coding sequences showed that 15/17 of them had greater than 80% of their sequences aligned to the de novo assembled contigs where 5/17 had their full-length (100%) aligned and 9/17 had greater than 90% of their sequences aligned. The combined FW and SW gill transcriptome was mapped to 13780 unique human identifiers at E-value < 1.0E-20 while 952 and 886 identifiers were determined as up and down-regulated by 1.5 fold, respectively, in the gills of A. testudineus in SW when compared with FW. These genes were found to be associated with at least 23 biological processes. A larger proportion of genes encoding enzymes and transporters associated with molecular transport, energy production, metabolisms were up-regulated, while a larger proportion of genes encoding transmembrane receptors, G-protein coupled receptors, kinases and transcription regulators associated with cell cycle, growth, development, signalling, morphology and gene expression were relatively lower in the gills of A. testudineus in SW when compared with FW. High correlation (R = 0.99) was observed between RNA-Seq data and real-time quantitative PCR validation for 13 selected genes. The transcriptomic sequence information will facilitate development of molecular resources and tools while the findings will provide insights for future studies into branchial iono-osmoregulation and related cellular processes in A. testudineus. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  7. Transcriptome sequencing and identification of cold tolerance genes in hardy Corylus species (C. heterophylla Fisch) floral buds.

    PubMed

    Chen, Xin; Zhang, Jin; Liu, Qingzhong; Guo, Wei; Zhao, Tiantian; Ma, Qinghua; Wang, Guixi

    2014-01-01

    The genus Corylus is an important woody species in Northeast China. Its products, hazelnuts, constitute one of the most important raw materials for the pastry and chocolate industry. However, limited genetic research has focused on Corylus because of the lack of genomic resources. The advent of high-throughput sequencing technologies provides a turning point for Corylus research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive database for the Corylus heterophylla Fisch floral buds. The C. heterophylla Fisch floral buds transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 28,930,890 raw reads and assembled them into 82,684 contigs. A total of 40,941 unigenes were identified, among which 30,549 were annotated in the NCBI Non-redundant (Nr) protein database and 18,581 were annotated in the Swiss-Prot database. Of these annotated unigenes, 25,311 and 10,514 unigenes were assigned to gene ontology (GO) categories and clusters of orthologous groups (COG), respectively. We could map 17,207 unigenes onto 128 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Additionally, based on the transcriptome, we constructed a candidate cold tolerance gene set of C. heterophylla Fisch floral buds. The expression patterns of selected genes during four stages of cold acclimation suggested that these genes might be involved in different cold responsive stages in C. heterophylla Fisch floral buds. The transcriptome of C. heterophylla Fisch floral buds was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the C. heterophylla Fisch floral buds transcriptome. Candidate genes potentially involved in cold tolerance were identified, providing a material basis for future molecular mechanism analysis of C. heterophylla Fisch floral buds tolerant to cold stress.

  8. De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.

    PubMed

    Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min

    2013-02-01

    Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Reefgenomics.Org - a repository for marine genomics data.

    PubMed

    Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R

    2016-01-01

    Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data. DATABASE URL: http://reefgenomics.org. © The Author(s) 2016. Published by Oxford University Press.

  10. Characterization of adult transcriptomes from the omnivorous lady beetle Coleomegilla maculata fed pollen or insect Egg Diet

    USDA-ARS?s Scientific Manuscript database

    32 reference transcriptome sequences described herein are filed with the National Center for Biotechnology Information (NCBI), GenBank Bioproject PRJNA236444. Transcriptome Shotgun Assembly (TSA) will also be submitted when upload instructions are received from gb-admin....

  11. Plant genome and transcriptome annotations: from misconceptions to simple solutions

    PubMed Central

    Bolger, Marie E; Arsova, Borjana; Usadel, Björn

    2018-01-01

    Abstract Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources. PMID:28062412

  12. The green ash transcriptome and identification of genes responding to abiotic and biotic stresses

    Treesearch

    Thomas Lane; Teodora Best; Nicole Zembower; Jack Davitt; Nathan Henry; Yi Xu; Jennifer Koch; Haiying Liang; John McGraw; Stephan Schuster; Donghwan Shim; Mark V. Coggeshall; John E. Carlson; Margaret E. Staton

    2016-01-01

    Background: To develop a set of transcriptome sequences to support research on environmental stress responses in green ash (Fraxinus pennsylvanica), we undertook deep RNA sequencing of green ash tissues under various stress treatments. The treatments, including emerald ash borer (EAB) feeding, heat, drought, cold and ozone, were selected to mimic...

  13. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS)

    Treesearch

    Peng Zhao; Hui-Juan Zhou; Daniel Potter; Yi-Heng Hu; Xiao-Jia Feng; Meng Dang; Li Feng; Saman Zulfiqar; Wen-Zhe Liu; Gui-Fang Zhao; Keith Woeste

    2018-01-01

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast...

  14. De novo assembly and characterization of the Trichuris trichiura adult worm transcriptome using Ion Torrent sequencing.

    PubMed

    Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C

    2016-07-01

    Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T. trichiura adult worm. Besides, orthologs of proteins demonstrated to have potent immunomodulatory properties in related parasitic helminths were also predicted from the T. trichiura de novo assembled transcriptome. Copyright © 2016. Published by Elsevier B.V.

  15. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.

    PubMed

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.

  16. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells

    PubMed Central

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. PMID:29208629

  17. De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.)

    PubMed Central

    2012-01-01

    Background In rubber tree, bark is one of important agricultural and biological organs. However, the molecular mechanism involved in the bark formation and development in rubber tree remains largely unknown, which is at least partially due to lack of bark transcriptomic and genomic information. Therefore, it is necessary to carried out high-throughput transcriptome sequencing of rubber tree bark to generate enormous transcript sequences for the functional characterization and molecular marker development. Results In this study, more than 30 million sequencing reads were generated using Illumina paired-end sequencing technology. In total, 22,756 unigenes with an average length of 485 bp were obtained with de novo assembly. The similarity search indicated that 16,520 and 12,558 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 6,867 and 5,559 unigenes were separately assigned to Gene Ontology (GO) and Clusters of Orthologous Group (COG). When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database, 12,097 unigenes were assigned to 5 main categories including 123 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (9,043, 74.75%), suggesting the active metabolic processes in rubber tree bark. In addition, a total of 39,257 EST-SSRs were identified from 22,756 unigenes, and the characterizations of EST-SSRs were further analyzed in rubber tree. 110 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among 13 Hevea germplasms, PCR success rate and polymorphism rate of 110 markers were separately 96.36% and 55.45% in this study. Conclusion By assembling and analyzing de novo transcriptome sequencing data, we reported the comprehensive functional characterization of rubber tree bark. This research generated a substantial fraction of rubber tree transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation, and microarrays development in rubber tree. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding in rubber tree. Moreover, this study also supported that transcriptome analysis based on Illumina paired-end sequencing is a powerful tool for transcriptome characterization and molecular marker development in non-model species, especially those with large and complex genomes. PMID:22607098

  18. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes.

    PubMed

    Ashrafi, Hamid; Hill, Theresa; Stoffel, Kevin; Kozik, Alexander; Yao, Jiqiang; Chin-Wo, Sebastian Reyes; Van Deynze, Allen

    2012-10-30

    Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80-120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.

  19. Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

    PubMed

    Ning, Juan; Wang, Minxiao; Li, Chaolun; Sun, Song

    2013-01-01

    Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs) that provide a resource for gene function studies. Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.

  20. VCGDB: a dynamic genome database of the Chinese population

    PubMed Central

    2014-01-01

    Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222

  1. The First Chameleon Transcriptome: Comparative Genomic Analysis of the OXPHOS System Reveals Loss of COX8 in Iguanian Lizards

    PubMed Central

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system. PMID:24009133

  2. The first Chameleon transcriptome: comparative genomic analysis of the OXPHOS system reveals loss of COX8 in Iguanian lizards.

    PubMed

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system.

  3. Exome capture from the spruce and pine giga-genomes.

    PubMed

    Suren, H; Hodgins, K A; Yeaman, S; Nurkowski, K A; Smets, P; Rieseberg, L H; Aitken, S N; Holliday, J A

    2016-09-01

    Sequence capture is a flexible tool for generating reduced representation libraries, particularly in species with massive genomes. We used an exome capture approach to sequence the gene space of two of the dominant species in Canadian boreal and montane forests - interior spruce (Picea glauca x engelmanii) and lodgepole pine (Pinus contorta). Transcriptome data generated with RNA-seq were coupled with draft genome sequences to design baits corresponding to 26 824 genes from pine and 28 649 genes from spruce. A total of 579 samples for spruce and 631 samples for pine were included, as well as two pine congeners and six spruce congeners. More than 50% of targeted regions were sequenced at >10× depth in each species, while ~12% captured near-target regions within 500 bp of a bait position were sequenced to a depth >10×. Much of our read data arose from off-target regions, which was likely due to the fragmented and incomplete nature of the draft genome assemblies. Capture in general was successful for the related species, suggesting that baits designed for a single species are likely to successfully capture sequences from congeners. From these data, we called approximately 10 million SNPs and INDELs in each species from coding regions, introns, untranslated and flanking regions, as well as from the intergenic space. Our study demonstrates the utility of sequence capture for resequencing in complex conifer genomes, suggests guidelines for improving capture efficiency and provides a rich resource of genetic variants for studies of selection and local adaptation in these species. © 2016 John Wiley & Sons Ltd.

  4. Sma3s: A universal tool for easy functional annotation of proteomes and transcriptomes.

    PubMed

    Casimiro-Soriguer, Carlos S; Muñoz-Mérida, Antonio; Pérez-Pulido, Antonio J

    2017-06-01

    The current cheapening of next-generation sequencing has led to an enormous growth in the number of sequenced genomes and transcriptomes, allowing wet labs to get the sequences from their organisms of study. To make the most of these data, one of the first things that should be done is the functional annotation of the protein-coding genes. But it used to be a slow and tedious step that can involve the characterization of thousands of sequences. Sma3s is an accurate computational tool for annotating proteins in an unattended way. Now, we have developed a completely new version, which includes functionalities that will be of utility for fundamental and applied science. Currently, the results provide functional categories such as biological processes, which become useful for both characterizing particular sequence datasets and comparing results from different projects. But one of the most important implemented innovations is that it has now low computational requirements, and the complete annotation of a simple proteome or transcriptome usually takes around 24 hours in a personal computer. Sma3s has been tested with a large amount of complete proteomes and transcriptomes, and it has demonstrated its potential in health science and other specific projects. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  6. De novo assembly of maritime pine transcriptome: implications for forest breeding and biotechnology.

    PubMed

    Canales, Javier; Bautista, Rocio; Label, Philippe; Gómez-Maldonado, Josefa; Lesur, Isabelle; Fernández-Pozo, Noe; Rueda-López, Marina; Guerrero-Fernández, Dario; Castro-Rodríguez, Vanessa; Benzekri, Hicham; Cañas, Rafael A; Guevara, María-Angeles; Rodrigues, Andreia; Seoane, Pedro; Teyssier, Caroline; Morel, Alexandre; Ehrenmann, François; Le Provost, Grégoire; Lalanne, Céline; Noirot, Céline; Klopp, Christophe; Reymond, Isabelle; García-Gutiérrez, Angel; Trontin, Jean-François; Lelu-Walter, Marie-Anne; Miguel, Celia; Cervera, María Teresa; Cantón, Francisco R; Plomion, Christophe; Harvengt, Luc; Avila, Concepción; Gonzalo Claros, M; Cánovas, Francisco M

    2014-04-01

    Maritime pine (Pinus pinasterAit.) is a widely distributed conifer species in Southwestern Europe and one of the most advanced models for conifer research. In the current work, comprehensive characterization of the maritime pine transcriptome was performed using a combination of two different next-generation sequencing platforms, 454 and Illumina. De novo assembly of the transcriptome provided a catalogue of 26 020 unique transcripts in maritime pine trees and a collection of 9641 full-length cDNAs. Quality of the transcriptome assembly was validated by RT-PCR amplification of selected transcripts for structural and regulatory genes. Transcription factors and enzyme-encoding transcripts were annotated. Furthermore, the available sequencing data permitted the identification of polymorphisms and the establishment of robust single nucleotide polymorphism (SNP) and simple-sequence repeat (SSR) databases for genotyping applications and integration of translational genomics in maritime pine breeding programmes. All our data are freely available at SustainpineDB, the P. pinaster expressional database. Results reported here on the maritime pine transcriptome represent a valuable resource for future basic and applied studies on this ecological and economically important pine species. © 2013 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.

  7. Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences1

    PubMed Central

    Gardner, Elliot M.; Laricchia, Kristen M.; Murphy, Matthew; Ragone, Diane; Scheffler, Brian E.; Simpson, Sheron; Williams, Evelyn W.; Zerega, Nyree J. C.

    2015-01-01

    Premise of the study: Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Methods and Results: Fifteen simple sequence repeats (SSRs) were identified in chloroplast sequences from four Artocarpus transcriptome assemblies. The markers were evaluated using capillary electrophoresis in A. odoratissimus (105 accessions) and A. altilis (73). They were also evaluated in silico in A. altilis (10), A. camansi (6), and A. altilis × A. mariannensis (7) transcriptomes. All loci were polymorphic in at least one species, with all 15 polymorphic in A. camansi. Per species, average alleles per locus ranged between 2.2 and 2.5. Three loci had evidence of fragment-length homoplasy. Conclusions: These markers will complement existing nuclear markers by enabling confident identification of maternal and clone lines, which are often important in vegetatively propagated crops such as breadfruit. PMID:26421253

  8. Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes

    PubMed Central

    An, Dong; Li, Changsheng; Humbeck, Klaus

    2018-01-01

    Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292

  9. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.

  10. Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny

    PubMed Central

    2013-01-01

    Background The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information. Results The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage. Conclusions A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species. PMID:23597112

  11. Quantitative high-throughput profiling of snake venom gland transcriptomes and proteomes (Ovophis okinavensis and Protobothrops flavoviridis)

    PubMed Central

    2013-01-01

    Background Advances in DNA sequencing and proteomics have facilitated quantitative comparisons of snake venom composition. Most studies have employed one approach or the other. Here, both Illumina cDNA sequencing and LC/MS were used to compare the transcriptomes and proteomes of two pit vipers, Protobothrops flavoviridis and Ovophis okinavensis, which differ greatly in their biology. Results Sequencing of venom gland cDNA produced 104,830 transcripts. The Protobothrops transcriptome contained transcripts for 103 venom-related proteins, while the Ovophis transcriptome contained 95. In both, transcript abundances spanned six orders of magnitude. Mass spectrometry identified peptides from 100% of transcripts that occurred at higher than contaminant (e.g. human keratin) levels, including a number of proteins never before sequenced from snakes. These transcriptomes reveal fundamentally different envenomation strategies. Adult Protobothrops venom promotes hemorrhage, hypotension, incoagulable blood, and prey digestion, consistent with mammalian predation. Ovophis venom composition is less readily interpreted, owing to insufficient pharmacological data for venom serine and metalloproteases, which comprise more than 97.3% of Ovophis transcripts, but only 38.0% of Protobothrops transcripts. Ovophis venom apparently represents a hybrid strategy optimized for frogs and small mammals. Conclusions This study illustrates the power of cDNA sequencing combined with MS profiling. The former quantifies transcript composition, allowing detection of novel proteins, but cannot indicate which proteins are actually secreted, as does MS. We show, for the first time, that transcript and peptide abundances are correlated. This means that MS can be used for quantitative, non-invasive venom profiling, which will be beneficial for studies of endangered species. PMID:24224955

  12. Transcriptome sequencing of different narrow-leafed lupin tissue types provides a comprehensive uni-gene assembly and extensive gene-based molecular markers

    PubMed Central

    Kamphuis, Lars G; Hane, James K; Nelson, Matthew N; Gao, Lingling; Atkins, Craig A; Singh, Karam B

    2015-01-01

    Narrow-leafed lupin (NLL; Lupinus angustifolius L.) is an important grain legume crop that is valuable for sustainable farming and is becoming recognized as a human health food. NLL breeding is directed at improving grain production, disease resistance, drought tolerance and health benefits. However, genetic and genomic studies have been hindered by a lack of extensive genomic resources for the species. Here, the generation, de novo assembly and annotation of transcriptome datasets derived from five different NLL tissue types of the reference accession cv. Tanjil are described. The Tanjil transcriptome was compared to transcriptomes of an early domesticated cv. Unicrop, a wild accession P27255, as well as accession 83A:476, together being the founding parents of two recombinant inbred line (RIL) populations. In silico predictions for transcriptome-derived gene-based length and SNP polymorphic markers were conducted and corroborated using a survey assembly sequence for NLL cv. Tanjil. This yielded extensive indel and SNP polymorphic markers for the two RIL populations. A total of 335 transcriptome-derived markers and 66 BAC-end sequence-derived markers were evaluated, and 275 polymorphic markers were selected to genotype the reference NLL 83A:476 × P27255 RIL population. This significantly improved the completeness, marker density and quality of the reference NLL genetic map. PMID:25060816

  13. iMETHYL: an integrative database of human DNA methylation, gene expression, and genomic variation.

    PubMed

    Komaki, Shohei; Shiwa, Yuh; Furukawa, Ryohei; Hachiya, Tsuyoshi; Ohmomo, Hideki; Otomo, Ryo; Satoh, Mamoru; Hitomi, Jiro; Sobue, Kenji; Sasaki, Makoto; Shimizu, Atsushi

    2018-01-01

    We launched an integrative multi-omics database, iMETHYL (http://imethyl.iwate-megabank.org). iMETHYL provides whole-DNA methylation (~24 million autosomal CpG sites), whole-genome (~9 million single-nucleotide variants), and whole-transcriptome (>14 000 genes) data for CD4 + T-lymphocytes, monocytes, and neutrophils collected from approximately 100 subjects. These data were obtained from whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing, making iMETHYL a comprehensive database.

  14. Discovery of the "RNA continent" through a contrarian's research strategy.

    PubMed

    Hayashizaki, Yoshihide

    2011-01-01

    The International Human Genome Sequencing Consortium completed the decoding of the human genome sequence in 2003. Readers will be aware of the paradigm shift which has occurred since then in the field of life science research. At last, mankind has been able to focus on a complete picture of the full extent of the genome, on which is recorded the basic information that controls all life. Meanwhile, another genome project, centered on Japan and known as the mouse genome encyclopedia project, was progressing with participation from around the world. Led by our research group at RIKEN, it was a full-length cDNA project which aimed to decode the whole RNA (transcriptome) using the mouse as a model. The basic information that controls all life is recorded on the genome, but in order to obtain a complete picture of this extensive information, the decoding of the genome alone is far from sufficient. These two genome projects established that the number of letters in the genome, which is the blueprint of life, is finite, that the number of RNA molecules derived from it is also finite, and that the number of protein molecules derived from the RNA is probably finite too. A massive number of combinations is still involved, but we are now able to understand one section of the network formed by these data. Once an object of study has been understood to be finite, establishing an image of the whole is certain to lead us to an understanding of the whole. Omics is an approach that views the information controlling life as finite and seeks to assemble and analyze it as a whole. Here, I would like to present our transcriptome research while making reference to our unique research strategy.

  15. Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

    PubMed

    Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

    2014-11-19

    Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.

  16. De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

    PubMed Central

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689

  17. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    PubMed

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  18. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    PubMed Central

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  19. Transcriptome Analysis in Venom Gland of the Predatory Giant Ant Dinoponera quadriceps: Insights into the Polypeptide Toxin Arsenal of Hymenopterans

    PubMed Central

    Chong, Cheong-Meng; Leung, Siu Wai; Prieto-da-Silva, Álvaro R. B.; Havt, Alexandre; Quinet, Yves P.; Martins, Alice M. C.; Lee, Simon M. Y.; Rádis-Baptista, Gandhi

    2014-01-01

    Background Dinoponera quadriceps is a predatory giant ant that inhabits the Neotropical region and subdues its prey (insects) with stings that deliver a toxic cocktail of molecules. Human accidents occasionally occur and cause local pain and systemic symptoms. A comprehensive study of the D. quadriceps venom gland transcriptome is required to advance our knowledge about the toxin repertoire of the giant ant venom and to understand the physiopathological basis of Hymenoptera envenomation. Results We conducted a transcriptome analysis of a cDNA library from the D. quadriceps venom gland with Sanger sequencing in combination with whole-transcriptome shotgun deep sequencing. From the cDNA library, a total of 420 independent clones were analyzed. Although the proportion of dinoponeratoxin isoform precursors was high, the first giant ant venom inhibitor cysteine-knot (ICK) toxin was found. The deep next generation sequencing yielded a total of 2,514,767 raw reads that were assembled into 18,546 contigs. A BLAST search of the assembled contigs against non-redundant and Swiss-Prot databases showed that 6,463 contigs corresponded to BLASTx hits and indicated an interesting diversity of transcripts related to venom gene expression. The majority of these venom-related sequences code for a major polypeptide core, which comprises venom allergens, lethal-like proteins and esterases, and a minor peptide framework composed of inter-specific structurally conserved cysteine-rich toxins. Both the cDNA library and deep sequencing yielded large proportions of contigs that showed no similarities with known sequences. Conclusions To our knowledge, this is the first report of the venom gland transcriptome of the New World giant ant D. quadriceps. The glandular venom system was dissected, and the toxin arsenal was revealed; this process brought to light novel sequences that included an ICK-folded toxins, allergen proteins, esterases (phospholipases and carboxylesterases), and lethal-like toxins. These findings contribute to the understanding of the ecology, behavior and venomics of hymenopterans. PMID:24498135

  20. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles.

    PubMed

    Tzika, Athanasia C; Helaers, Raphaël; Schramm, Gerrit; Milinkovitch, Michel C

    2011-09-26

    Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics.

  1. PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics

    PubMed Central

    2012-01-01

    Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop. PMID:22712730

  2. Mining genes involved in insecticide resistance of Liposcelis bostrychophila Badonnel by transcriptome and expression profile analysis.

    PubMed

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids.

  3. Mining Genes Involved in Insecticide Resistance of Liposcelis bostrychophila Badonnel by Transcriptome and Expression Profile Analysis

    PubMed Central

    Dou, Wei; Shen, Guang-Mao; Niu, Jin-Zhi; Ding, Tian-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2013-01-01

    Background Recent studies indicate that infestations of psocids pose a new risk for global food security. Among the psocids species, Liposcelis bostrychophila Badonnel has gained recognition in importance because of its parthenogenic reproduction, rapid adaptation, and increased worldwide distribution. To date, the molecular data available for L. bostrychophila is largely limited to genes identified through homology. Also, no transcriptome data relevant to psocids infection is available. Methodology and Principal Findings In this study, we generated de novo assembly of L. bostrychophila transcriptome performed through the short read sequencing technology (Illumina). In a single run, we obtained more than 51 million sequencing reads that were assembled into 60,012 unigenes (mean size = 711 bp) by Trinity. The transcriptome sequences from different developmental stages of L. bostrychophila including egg, nymph and adult were annotated with non-redundant (Nr) protein database, gene ontology (GO), cluster of orthologous groups of proteins (COG), and KEGG orthology (KO). The analysis revealed three major enzyme families involved in insecticide metabolism as differentially expressed in the L. bostrychophila transcriptome. A total of 49 P450-, 31 GST- and 21 CES-specific genes representing the three enzyme families were identified. Besides, 16 transcripts were identified to contain target site sequences of resistance genes. Furthermore, we profiled gene expression patterns upon insecticide (malathion and deltamethrin) exposure using the tag-based digital gene expression (DGE) method. Conclusion The L. bostrychophila transcriptome and DGE data provide gene expression data that would further our understanding of molecular mechanisms in psocids. In particular, the findings of this investigation will facilitate identification of genes involved in insecticide resistance and designing of new compounds for control of psocids. PMID:24278202

  4. Transcriptome analysis in cotton boll weevil (Anthonomus grandis) and RNA interference in insect pests.

    PubMed

    Firmino, Alexandre Augusto Pereira; Fonseca, Fernando Campos de Assis; de Macedo, Leonardo Lima Pepino; Coelho, Roberta Ramos; Antonino de Souza, José Dijair; Togawa, Roberto Coiti; Silva-Junior, Orzenil Bonfim; Pappas, Georgios Joannis; da Silva, Maria Cristina Mattar; Engler, Gilbert; Grossi-de-Sa, Maria Fatima

    2013-01-01

    Cotton plants are subjected to the attack of several insect pests. In Brazil, the cotton boll weevil, Anthonomus grandis, is the most important cotton pest. The use of insecticidal proteins and gene silencing by interference RNA (RNAi) as techniques for insect control are promising strategies, which has been applied in the last few years. For this insect, there are not much available molecular information on databases. Using 454-pyrosequencing methodology, the transcriptome of all developmental stages of the insect pest, A. grandis, was analyzed. The A. grandis transcriptome analysis resulted in more than 500.000 reads and a data set of high quality 20,841 contigs. After sequence assembly and annotation, around 10,600 contigs had at least one BLAST hit against NCBI non-redundant protein database and 65.7% was similar to Tribolium castaneum sequences. A comparison of A. grandis, Drosophila melanogaster and Bombyx mori protein families' data showed higher similarity to dipteran than to lepidopteran sequences. Several contigs of genes encoding proteins involved in RNAi mechanism were found. PAZ Domains sequences extracted from the transcriptome showed high similarity and conservation for the most important functional and structural motifs when compared to PAZ Domains from 5 species. Two SID-like contigs were phylogenetically analyzed and grouped with T. castaneum SID-like proteins. No RdRP gene was found. A contig matching chitin synthase 1 was mined from the transcriptome. dsRNA microinjection of a chitin synthase gene to A. grandis female adults resulted in normal oviposition of unviable eggs and malformed alive larvae that were unable to develop in artificial diet. This is the first study that characterizes the transcriptome of the coleopteran, A. grandis. A new and representative transcriptome database for this insect pest is now available. All data support the state of the art of RNAi mechanism in insects.

  5. Transcriptome Analysis in Cotton Boll Weevil (Anthonomus grandis) and RNA Interference in Insect Pests

    PubMed Central

    Coelho, Roberta Ramos; Antonino de Souza Jr, José Dijair; Togawa, Roberto Coiti; Silva-Junior, Orzenil Bonfim; Pappas-Jr, Georgios Joannis; da Silva, Maria Cristina Mattar; Engler, Gilbert; Grossi-de-Sa, Maria Fatima

    2013-01-01

    Cotton plants are subjected to the attack of several insect pests. In Brazil, the cotton boll weevil, Anthonomus grandis, is the most important cotton pest. The use of insecticidal proteins and gene silencing by interference RNA (RNAi) as techniques for insect control are promising strategies, which has been applied in the last few years. For this insect, there are not much available molecular information on databases. Using 454-pyrosequencing methodology, the transcriptome of all developmental stages of the insect pest, A. grandis, was analyzed. The A. grandis transcriptome analysis resulted in more than 500.000 reads and a data set of high quality 20,841 contigs. After sequence assembly and annotation, around 10,600 contigs had at least one BLAST hit against NCBI non-redundant protein database and 65.7% was similar to Tribolium castaneum sequences. A comparison of A. grandis, Drosophila melanogaster and Bombyx mori protein families’ data showed higher similarity to dipteran than to lepidopteran sequences. Several contigs of genes encoding proteins involved in RNAi mechanism were found. PAZ Domains sequences extracted from the transcriptome showed high similarity and conservation for the most important functional and structural motifs when compared to PAZ Domains from 5 species. Two SID-like contigs were phylogenetically analyzed and grouped with T. castaneum SID-like proteins. No RdRP gene was found. A contig matching chitin synthase 1 was mined from the transcriptome. dsRNA microinjection of a chitin synthase gene to A. grandis female adults resulted in normal oviposition of unviable eggs and malformed alive larvae that were unable to develop in artificial diet. This is the first study that characterizes the transcriptome of the coleopteran, A. grandis. A new and representative transcriptome database for this insect pest is now available. All data support the state of the art of RNAi mechanism in insects. PMID:24386449

  6. Next-generation transcriptome sequencing, SNP discovery and validation in four market classes of peanut, Arachis hypogaea L.

    PubMed

    Chopra, Ratan; Burow, Gloria; Farmer, Andrew; Mudge, Joann; Simpson, Charles E; Wilkins, Thea A; Baring, Michael R; Puppala, Naveen; Chamberlin, Kelly D; Burow, Mark D

    2015-06-01

    Single-nucleotide polymorphisms, which can be identified in the thousands or millions from comparisons of transcriptome or genome sequences, are ideally suited for making high-resolution genetic maps, investigating population evolutionary history, and discovering marker-trait linkages. Despite significant results from their use in human genetics, progress in identification and use in plants, and particularly polyploid plants, has lagged. As part of a long-term project to identify and use SNPs suitable for these purposes in cultivated peanut, which is tetraploid, we generated transcriptome sequences of four peanut cultivars, namely OLin, New Mexico Valencia C, Tamrun OL07 and Jupiter, which represent the four major market classes of peanut grown in the world, and which are important economically to the US southwest peanut growing region. CopyDNA libraries of each genotype were used to generate 2 × 54 paired-end reads using an Illumina GAIIx sequencer. Raw reads were mapped to a custom reference consisting of Tifrunner 454 sequences plus peanut ESTs in GenBank, compromising 43,108 contigs; 263,840 SNP and indel variants were identified among four genotypes compared to the reference. A subset of 6 variants was assayed across 24 genotypes representing four market types using KASP chemistry to assess the criteria for SNP selection. Results demonstrated that transcriptome sequencing can identify SNPs usable as selectable DNA-based markers in complex polyploid species such as peanut. Criteria for effective use of SNPs as markers are discussed in this context.

  7. Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis

    USDA-ARS?s Scientific Manuscript database

    Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptom...

  8. Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.

    PubMed

    Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry

    2016-11-01

    Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.

  9. Genome-Scale Transcriptome Analysis in Response to Nitric Oxide in Birch Cells: Implications of the Triterpene Biosynthetic Pathway

    PubMed Central

    Zeng, Fansuo; Sun, Fengkun; Li, Leilei; Liu, Kun; Zhan, Yaguang

    2014-01-01

    Evidence supporting nitric oxide (NO) as a mediator of plant biochemistry continues to grow, but its functions at the molecular level remains poorly understood and, in some cases, controversial. To study the role of NO at the transcriptional level in Betula platyphylla cells, we conducted a genome-scale transcriptome analysis of these cells. The transcriptome of untreated birch cells and those treated by sodium nitroprusside (SNP) were analyzed using the Solexa sequencing. Data were collected by sequencing cDNA libraries of birch cells, which had a long period to adapt to the suspension culture conditions before SNP-treated cells and untreated cells were sampled. Among the 34,100 UniGenes detected, BLASTX search revealed that 20,631 genes showed significant (E-values≤10−5) sequence similarity with proteins from the NR-database. Numerous expressed sequence tags (i.e., 1374) were identified as differentially expressed between the 12 h SNP-treated cells and control cells samples: 403 up-regulated and 971 down-regulated. From this, we specifically examined a core set of NO-related transcripts. The altered expression levels of several transcripts, as determined by transcriptome analysis, was confirmed by qRT-PCR. The results of transcriptome analysis, gene expression quantification, the content of triterpenoid and activities of defensive enzymes elucidated NO has a significant effect on many processes including triterpenoid production, carbohydrate metabolism and cell wall biosynthesis. PMID:25551661

  10. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses

    PubMed Central

    Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

    2015-01-01

    Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement. PMID:26208029

  11. De Novo Assembly of the Donkey White Blood Cell Transcriptome and a Comparative Analysis of Phenotype-Associated Genes between Donkeys and Horses.

    PubMed

    Xie, Feng-Yun; Feng, Yu-Long; Wang, Hong-Hui; Ma, Yun-Feng; Yang, Yang; Wang, Yin-Chao; Shen, Wei; Pan, Qing-Jie; Yin, Shen; Sun, Yu-Jiang; Ma, Jun-Yu

    2015-01-01

    Prior to the mechanization of agriculture and labor-intensive tasks, humans used donkeys (Equus africanus asinus) for farm work and packing. However, as mechanization increased, donkeys have been increasingly raised for meat, milk, and fur in China. To maintain the development of the donkey industry, breeding programs should focus on traits related to these new uses. Compared to conventional marker-assisted breeding plans, genome- and transcriptome-based selection methods are more efficient and effective. To analyze the coding genes of the donkey genome, we assembled the transcriptome of donkey white blood cells de novo. Using transcriptomic deep-sequencing data, we identified 264,714 distinct donkey unigenes and predicted 38,949 protein fragments. We annotated the donkey unigenes by BLAST searches against the non-redundant (NR) protein database. We also compared the donkey protein sequences with those of the horse (E. caballus) and wild horse (E. przewalskii), and linked the donkey protein fragments with mammalian phenotypes. As the outer ear size of donkeys and horses are obviously different, we compared the outer ear size-associated proteins in donkeys and horses. We identified three ear size-associated proteins, HIC1, PRKRA, and KMT2A, with sequence differences among the donkey, horse, and wild horse loci. Since the donkey genome sequence has not been released, the de novo assembled donkey transcriptome is helpful for preliminary investigations of donkey cultivars and for genetic improvement.

  12. Discovery and Annotation of Plant Endogenous Target Mimicry Sequences from Public Transcriptome Libraries: A Case Study of Prunus persica.

    PubMed

    Karakülah, Gökhan

    2017-06-28

    Novel transcript discovery through RNA sequencing has substantially improved our understanding of the transcriptome dynamics of biological systems. Endogenous target mimicry (eTM) transcripts, a novel class of regulatory molecules, bind to their target microRNAs (miRNAs) by base pairing and block their biological activity. The objective of this study was to provide a computational analysis framework for the prediction of putative eTM sequences in plants, and as an example, to discover previously un-annotated eTMs in Prunus persica (peach) transcriptome. Therefore, two public peach transcriptome libraries downloaded from Sequence Read Archive (SRA) and a previously published set of long non-coding RNAs (lncRNAs) were investigated with multi-step analysis pipeline, and 44 putative eTMs were found. Additionally, an eTM-miRNA-mRNA regulatory network module associated with peach fruit organ development was built via integration of the miRNA target information and predicted eTM-miRNA interactions. My findings suggest that one of the most widely expressed miRNA families among diverse plant species, miR156, might be potentially sponged by seven putative eTMs. Besides, the study indicates eTMs potentially play roles in the regulation of development processes in peach fruit via targeting specific miRNAs. In conclusion, by following the step-by step instructions provided in this study, novel eTMs can be identified and annotated effectively in public plant transcriptome libraries.

  13. A method for the further assembly of targeted unigenes in a transcriptome after assembly by Trinity

    PubMed Central

    Xiao, Xinlong; Ma, Jinbiao; Sun, Yufang; Yao, Yinan

    2015-01-01

    RNA-sequencing has been widely used to obtain high throughput transcriptome sequences in various species, but the assembly of a full set of complete transcripts is still a significant challenge. Judging by the number of expected transcripts and assembled unigenes in a transcriptome library, we believe that some unigenes could be reassembled. In this study, using the nitrate transporter (NRT) gene family and phosphate transporter (PHT) gene family in Salicornia europaea as examples, we introduced an approach to further assemble unigenes found in transcriptome libraries which had been previously generated by Trinity. To find the unigenes of a particular transcript that contained gaps, we respectively selected 16 NRT candidate unigene pairs and 12 PHT candidate unigene pairs for which the two unigenes had the same annotations, the same expression patterns among various RNA-seq samples, and different positions of the proteins coded as mapped to a reference protein. To fill a gap between the two unigenes, PCR was performed using primers that mapped to the two unigenes and the PCR products were sequenced, which demonstrated that 5 unigene pairs of NRT and 3 unigene pairs of PHT could be reassembled when the gaps were filled using the corresponding PCR product sequences. This fast and simple method will reduce the redundancy of targeted unigenes and allow acquisition of complete coding sequences (CDS). PMID:26528307

  14. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  15. Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome

    PubMed Central

    Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.

    2011-01-01

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880

  16. De novo assembly of the pepper transcriptome (Capsicum annuum): a benchmark for in silico discovery of SNPs, SSRs and candidate genes

    PubMed Central

    2012-01-01

    Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project. PMID:23110314

  17. Comparison of next generation sequencing technologies for transcriptome characterization

    PubMed Central

    2009-01-01

    Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272

  18. Transcriptome Profiling of Buffalograss Challenged with the Leaf Spot Pathogen Curvularia inaequalis.

    PubMed

    Amaradasa, Bimal S; Amundsen, Keenan

    2016-01-01

    Buffalograss (Bouteloua dactyloides) is a low maintenance U. S. native turfgrass species with exceptional drought, heat, and cold tolerance. Leaf spot caused by Curvularia inaequalis negatively impacts buffalograss visual quality. Two leaf spot susceptible and two resistant buffalograss lines were challenged with C. inaequalis. Samples were collected from treated and untreated leaves when susceptible lines showed symptoms. Transcriptome sequencing was done and differentially expressed genes were identified. Approximately 27 million raw sequencing reads were produced per sample. More than 86% of the sequencing reads mapped to an existing buffalograss reference transcriptome. De novo assembly of unmapped reads was merged with the existing reference to produce a more complete transcriptome. There were 461 differentially expressed transcripts between the resistant and susceptible lines when challenged with the pathogen and 1552 in its absence. Previously characterized defense-related genes were identified among the differentially expressed transcripts. Twenty one resistant line transcripts were similar to genes regulating pattern triggered immunity and 20 transcripts were similar to genes regulating effector triggered immunity. There were also nine up-regulated transcripts in resistance lines which showed potential to initiate systemic acquired resistance (SAR) and three transcripts encoding pathogenesis-related proteins which are downstream products of SAR. This is the first study characterizing changes in the buffalograss transcriptome when challenged with C. inaequalis.

  19. IsoPlot: a database for comparison of mRNA isoforms in fruit fly and mosquitoes

    PubMed Central

    Ng, I-Man; Tsai, Shang-Chi

    2017-01-01

    Abstract Alternative splicing (AS), a mechanism by which different forms of mature messenger RNAs (mRNAs) are generated from the same gene, widely occurs in the metazoan genomes. Knowledge about isoform variants and abundance is crucial for understanding the functional context in the molecular diversity of the species. With increasing transcriptome data of model and non-model species, a database for visualization and comparison of AS events with up-to-date information is needed for further research. IsoPlot is a publicly available database with visualization tools for exploration of AS events, including three major species of mosquitoes, Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus, and fruit fly Drosophila melanogaster, the model insect species. IsoPlot includes not only 88,663 annotated transcripts but also 17,037 newly predicted transcripts from massive transcriptome data at different developmental stages of mosquitoes. The web interface enables users to explore the patterns and abundance of isoforms in different experimental conditions as well as cross-species sequence comparison of orthologous transcripts. IsoPlot provides a platform for researchers to access comprehensive information about AS events in mosquitoes and fruit fly. Our database is available on the web via an interactive user interface with an intuitive graphical design, which is applicable for the comparison of complex isoforms within or between species. Database URL: http://isoplot.iis.sinica.edu.tw/ PMID:29220459

  20. Comparative Transcriptome Analysis of the Accessory Sex Gland and Testis from the Chinese Mitten Crab (Eriocheir sinensis)

    PubMed Central

    He, Lin; Jiang, Hui; Cao, Dandan; Liu, Lihua; Hu, Songnian; Wang, Qun

    2013-01-01

    The accessory sex gland (ASG) is an important component of the male reproductive system, which functions to enhance the fertility of spermatozoa during male reproduction. Certain proteins secreted by the ASG are known to bind to the spermatozoa membrane and affect its function. The ASG gene expression profile in Chinese mitten crab (Eriocheir sinensis) has not been extensively studied, and limited genetic research has been conducted on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for the ASG of E. sinensis using Illumina sequencing technology. This analysis yielded a total of 33,221,284 sequencing reads, including 2.6 Gb of total nucleotides. Reads were assembled into 85,913 contigs (average 218 bp), or 58,567 scaffold sequences (average 292 bp), that identified 37,955 unigenes (average 385 bp). We assembled all unigenes and compared them with the published testis transcriptome from E. sinensis. In order to identify which genes may be involved in ASG function, as it pertains to modification of spermatozoa, we compared the ASG and testis transcriptome of E. sinensis. Our analysis identified specific genes with both higher and lower tissue expression levels in the two tissues, and the functions of these genes were analyzed to elucidate their potential roles during maturation of spermatozoa. Availability of detailed transcriptome data from ASG and testis in E. sinensis can assist our understanding of the molecular mechanisms involved with spermatozoa conservation, transport, maturation and capacitation and potentially acrosome activation. PMID:23342039

  1. De novo assembly and characterization of the leaf, bud, and fruit transcriptome from the vulnerable tree Juglans mandshurica for the development of 20 new microsatellite markers using Illumina sequencing.

    PubMed

    Hu, Zhuang; Zhang, Tian; Gao, Xiao-Xiao; Wang, Yang; Zhang, Qiang; Zhou, Hui-Juan; Zhao, Gui-Fang; Wang, Ma-Li; Woeste, Keith E; Zhao, Peng

    2016-04-01

    Manchurian walnut (Juglans mandshurica Maxim.) is a vulnerable, temperate deciduous tree valued for its wood and nut, but transcriptomic and genomic data for the species are very limited. Next generation sequencing (NGS) has made it possible to develop molecular markers for this species rapidly and efficiently. Our goal is to use transcriptome information from RNA-Seq to understand development in J. mandshurica and develop polymorphic simple sequence repeats (SSRs, microsatellites) to understand the species' population genetics. In this study, more than 47.7 million clean reads were generated using Illumina sequencing technology. De novo assembly yielded 99,869 unigenes with an average length of 747 bp. Based on sequence similarity search with known proteins, a total of 39,708 (42.32 %) genes were identified. Searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) identified 15,903 (16.9 %) unigenes. Further, we identified and characterized 63 new transcriptome-derived microsatellite markers. By testing the markers on 4 to 14 individuals from four populations, we found that 20 were polymorphic and easily amplified. The number of alleles per locus ranged from 2 to 8. The observed and expected heterozygosity per locus ranged from 0.209 to 0.813 and 0.335 to 0.842, respectively. These twenty microsatellite markers will be useful for studies of population genetics, diversity, and genetic structure, and they will undoubtedly benefit future breeding studies of this walnut species. Moreover, the information uncovered in this research will also serve as a useful genetic resource for understanding the transcriptome and development of J. mandshurica and other Juglans species.

  2. Girardia dorotocephala transcriptome sequence, assembly, and validation through characterization of piwi homologs and stem cell progeny markers

    PubMed Central

    Almazan, Eugene Matthew P.; Lesko, Sydney L.; Markey, Michael P.; Rouhana, Labib

    2017-01-01

    Planarian flatworms are popular models for the study of regeneration and stem cell biology in vivo. Technical advances and increased availability of genetic information have fueled the discovery of molecules responsible for stem cell pluripotency and regeneration in flatworms. Unfortunately, most of the planarian research performed worldwide utilizes species that are not natural habitants of North America, which limits their availability to newcomer laboratories and impedes their distribution for educational activities. In order to circumvent these limitations and increase the genetic information available for comparative studies, we sequenced the transcriptome of Girardia dorotocephala, a planarian species pandemic and commercially available in North America. A total of 254,802,670 paired sequence reads were obtained from RNA extracted from intact individuals, regenerating fragments, as well as freshly excised auricles of a clonal line of G. dorotocephala (MA-C2), and used for de novo assembly of its transcriptome. The resulting transcriptome draft was validated through functional analysis of genetic markers of stem cells and their progeny in G. dorotocephala. Akin to orthologs in other planarian species, G. dorotocephala Piwi1 (GdPiwi1) was found to be a robust marker of the planarian stem cell population and GdPiwi2 an essential component for stem cell-driven regeneration. Identification of G. dorotocephala homologs of the early stem cell descendent marker PROG-1 revealed a family of lysine-rich proteins expressed during epithelial cell differentiation. Sequences from the MA-C2 transcriptome were found to be 98–99% identical to nucleotide sequences from G. dorotocephala populations with different chromosomal number, demonstrating strong conservation regardless of karyotype evolution. Altogether, this work establishes G. dorotocephala as a viable and accessible option for analysis of gene function in North America. PMID:28774726

  3. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.

  4. De novo characterization of the Chinese fir (Cunninghamia lanceolata) transcriptome and analysis of candidate genes involved in cellulose and lignin biosynthesis

    PubMed Central

    2012-01-01

    Background Chinese fir (Cunninghamia lanceolata) is an important timber species that accounts for 20–30% of the total commercial timber production in China. However, the available genomic information of Chinese fir is limited, and this severely encumbers functional genomic analysis and molecular breeding in Chinese fir. Recently, major advances in transcriptome sequencing have provided fast and cost-effective approaches to generate large expression datasets that have proven to be powerful tools to profile the transcriptomes of non-model organisms with undetermined genomes. Results In this study, the transcriptomes of nine tissues from Chinese fir were analyzed using the Illumina HiSeq™ 2000 sequencing platform. Approximately 40 million paired-end reads were obtained, generating 3.62 gigabase pairs of sequencing data. These reads were assembled into 83,248 unique sequences (i.e. Unigenes) with an average length of 449 bp, amounting to 37.40 Mb. A total of 73,779 Unigenes were supported by more than 5 reads, 42,663 (57.83%) had homologs in the NCBI non-redundant and Swiss-Prot protein databases, corresponding to 27,224 unique protein entries. Of these Unigenes, 16,750 were assigned to Gene Ontology classes, and 14,877 were clustered into orthologous groups. A total of 21,689 (29.40%) were mapped to 119 pathways by BLAST comparison against the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The majority of the genes encoding the enzymes in the biosynthetic pathways of cellulose and lignin were identified in the Unigene dataset by targeted searches of their annotations. And a number of candidate Chinese fir genes in the two metabolic pathways were discovered firstly. Eighteen genes related to cellulose and lignin biosynthesis were cloned for experimental validating of transcriptome data. Overall 49 Unigenes, covering different regions of these selected genes, were found by alignment. Their expression patterns in different tissues were analyzed by qRT-PCR to explore their putative functions. Conclusions A substantial fraction of transcript sequences was obtained from the deep sequencing of Chinese fir. The assembled Unigene dataset was used to discover candidate genes of cellulose and lignin biosynthesis. This transcriptome dataset will provide a comprehensive sequence resource for molecular genetics research of C. lanceolata. PMID:23171398

  5. A detailed gene expression study of the Miscanthus genus reveals changes in the transcriptome associated with the rejuvenation of spring rhizomes.

    PubMed

    Barling, Adam; Swaminathan, Kankshita; Mitros, Therese; James, Brandon T; Morris, Juliette; Ngamboma, Ornella; Hall, Megan C; Kirkpatrick, Jessica; Alabady, Magdy; Spence, Ashley K; Hudson, Matthew E; Rokhsar, Daniel S; Moose, Stephen P

    2013-12-09

    The Miscanthus genus of perennial C4 grasses contains promising biofuel crops for temperate climates. However, few genomic resources exist for Miscanthus, which limits understanding of its interesting biology and future genetic improvement. A comprehensive catalog of expressed sequences were generated from a variety of Miscanthus species and tissue types, with an emphasis on characterizing gene expression changes in spring compared to fall rhizomes. Illumina short read sequencing technology was used to produce transcriptome sequences from different tissues and organs during distinct developmental stages for multiple Miscanthus species, including Miscanthus sinensis, Miscanthus sacchariflorus, and their interspecific hybrid Miscanthus × giganteus. More than fifty billion base-pairs of Miscanthus transcript sequence were produced. Overall, 26,230 Sorghum gene models (i.e., ~ 96% of predicted Sorghum genes) had at least five Miscanthus reads mapped to them, suggesting that a large portion of the Miscanthus transcriptome is represented in this dataset. The Miscanthus × giganteus data was used to identify genes preferentially expressed in a single tissue, such as the spring rhizome, using Sorghum bicolor as a reference. Quantitative real-time PCR was used to verify examples of preferential expression predicted via RNA-Seq. Contiguous consensus transcript sequences were assembled for each species and annotated using InterProScan. Sequences from the assembled transcriptome were used to amplify genomic segments from a doubled haploid Miscanthus sinensis and from Miscanthus × giganteus to further disentangle the allelic and paralogous variations in genes. This large expressed sequence tag collection creates a valuable resource for the study of Miscanthus biology by providing detailed gene sequence information and tissue preferred expression patterns. We have successfully generated a database of transcriptome assemblies and demonstrated its use in the study of genes of interest. Analysis of gene expression profiles revealed biological pathways that exhibit altered regulation in spring compared to fall rhizomes, which are consistent with their different physiological functions. The expression profiles of the subterranean rhizome provides a better understanding of the biological activities of the underground stem structures that are essentials for perenniality and the storage or remobilization of carbon and nutrient resources.

  6. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Single-cell sequencing and tumorigenesis: improved understanding of tumor evolution and metastasis.

    PubMed

    Ellsworth, Darrell L; Blackburn, Heather L; Shriver, Craig D; Rabizadeh, Shahrooz; Soon-Shiong, Patrick; Ellsworth, Rachel E

    2017-12-01

    Extensive genomic and transcriptomic heterogeneity in human cancer often negatively impacts treatment efficacy and survival, thus posing a significant ongoing challenge for modern treatment regimens. State-of-the-art DNA- and RNA-sequencing methods now provide high-resolution genomic and gene expression portraits of individual cells, facilitating the study of complex molecular heterogeneity in cancer. Important developments in single-cell sequencing (SCS) technologies over the past 5 years provide numerous advantages over traditional sequencing methods for understanding the complexity of carcinogenesis, but significant hurdles must be overcome before SCS can be clinically useful. In this review, we: (1) highlight current methodologies and recent technological advances for isolating single cells, single-cell whole-genome and whole-transcriptome amplification using minute amounts of nucleic acids, and SCS, (2) summarize research investigating molecular heterogeneity at the genomic and transcriptomic levels and how this heterogeneity affects clonal evolution and metastasis, and (3) discuss the promise for integrating SCS in the clinical care arena for improved patient care.

  8. Whole transcriptome analysis using next-generation sequencing of model species Setaria viridis to support C4 photosynthesis research.

    PubMed

    Xu, Jiajia; Li, Yuanyuan; Ma, Xiuling; Ding, Jianfeng; Wang, Kai; Wang, Sisi; Tian, Ye; Zhang, Hui; Zhu, Xin-Guang

    2013-09-01

    Setaria viridis is an emerging model species for genetic studies of C4 photosynthesis. Many basic molecular resources need to be developed to support for this species. In this paper, we performed a comprehensive transcriptome analysis from multiple developmental stages and tissues of S. viridis using next-generation sequencing technologies. Sequencing of the transcriptome from multiple tissues across three developmental stages (seed germination, vegetative growth, and reproduction) yielded a total of 71 million single end 100 bp long reads. Reference-based assembly using Setaria italica genome as a reference generated 42,754 transcripts. De novo assembly generated 60,751 transcripts. In addition, 9,576 and 7,056 potential simple sequence repeats (SSRs) covering S. viridis genome were identified when using the reference based assembled transcripts and the de novo assembled transcripts, respectively. This identified transcripts and SSR provided by this study can be used for both reverse and forward genetic studies based on S. viridis.

  9. Pyrosequencing the Manduca sexta larval midgut transcriptome: messages for digestion, detoxification and defence.

    PubMed

    Pauchet, Y; Wilkinson, P; Vogel, H; Nelson, D R; Reynolds, S E; Heckel, D G; ffrench-Constant, R H

    2010-02-01

    The tobacco hornworm Manduca sexta is an important model for insect physiology but genomic and transcriptomic data are currently lacking. Following a recent pyrosequencing study generating immune related expressed sequence tags (ESTs), here we use this new technology to define the M. sexta larval midgut transcriptome. We generated over 387,000 midgut ESTs, using a combination of Sanger and 454 sequencing, and classified predicted proteins into those involved in digestion, detoxification and immunity. In many cases the depth of 454 pyrosequencing coverage allowed us to define the entire cDNA sequence of a particular gene. Many new M. sexta genes are described including up to 36 new cytochrome P450s, some of which have been implicated in the metabolism of host plant-derived nicotine. New lepidopteran gene families such as the beta-fructofuranosidases, previously thought to be restricted to Bombyx mori, are also described. An unexpectedly high number of ESTs were involved in immunity, for example 39 contigs encoding serpins, and the increasingly appreciated role of the midgut in insect immunity is discussed. Similar studies of other tissues will allow for a tissue by tissue description of the M. sexta transcriptome and will form an essential complimentary step on the road to genome sequencing and annotation.

  10. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles

    PubMed Central

    2011-01-01

    Background Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Results Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. Conclusions The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics. PMID:21943375

  11. De novo Transcriptome Analysis of Portunus trituberculatus Ovary and Testis by RNA-Seq: Identification of Genes Involved in Gonadal Development

    PubMed Central

    Meng, Xian-liang; Liu, Ping; Jia, Fu-long; Li, Jian; Gao, Bao-Quan

    2015-01-01

    The swimming crab Portunus trituberculatus is a commercially important crab species in East Asia countries. Gonadal development is a physiological process of great significance to the reproduction as well as commercial seed production for P. trituberculatus. However, little is currently known about the molecular mechanisms governing the developmental processes of gonads in this species. To open avenues of molecular research on P. trituberculatus gonadal development, Illumina paired-end sequencing technology was employed to develop deep-coverage transcriptome sequencing data for its gonads. Illumina sequencing generated 58,429,148 and 70,474,978 high-quality reads from the ovary and testis cDNA library, respectively. All these reads were assembled into 54,960 unigenes with an average sequence length of 879 bp, of which 12,340 unigenes (22.45% of the total) matched sequences in GenBank non-redundant database. Based on our transcriptome analysis as well as published literature, a number of candidate genes potentially involved in the regulation of gonadal development of P. trituberculatus were identified, such as FAOMeT, mPRγ, PGMRC1, PGDS, PGER4, 3β-HSD and 17β-HSDs. Differential expression analysis generated 5,919 differentially expressed genes between ovary and testis, among which many genes related to gametogenesis and several genes previously reported to be critical in differentiation and development of gonads were found, including Foxl2, Wnt4, Fst, Fem-1 and Sox9. Furthermore, 28,534 SSRs and 111,646 high-quality SNPs were identified in this transcriptome dataset. This work represents the first transcriptome analysis of P. trituberculatus gonads using the next generation sequencing technology and provides a valuable dataset for understanding molecular mechanisms controlling development of gonads and facilitating future investigation of reproductive biology in this species. The molecular markers obtained in this study will provide a fundamental basis for population genetics and functional genomics in P. trituberculatus and other closely related species. PMID:26042806

  12. TCW: Transcriptome Computational Workbench

    PubMed Central

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R.

    2013-01-01

    Background The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. Methodology The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. Conclusion It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw. PMID:23874959

  13. TCW: transcriptome computational workbench.

    PubMed

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  14. Transcriptome complexity in cardiac development and diseases--an expanding universe between genome and phenome.

    PubMed

    Gao, Chen; Wang, Yibin

    2014-01-01

    With the advancement of transcriptome profiling by micro-arrays and high-throughput RNA-sequencing, transcriptome complexity and its dynamics are revealed at different levels in cardiovascular development and diseases. In this review, we will highlight the recent progress in our knowledge of cardiovascular transcriptome complexity contributed by RNA splicing, RNA editing and noncoding RNAs. The emerging importance of many of these previously under-explored aspects of gene regulation in cardiovascular development and pathology will be discussed.

  15. Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi.

    PubMed

    Song, Zhangyong; Yin, Youping; Jiang, Shasha; Liu, Juanjuan; Chen, Huan; Wang, Zhongkang

    2013-06-19

    Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia.

  16. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes).

    PubMed

    Kukekova, Anna V; Johnson, Jennifer L; Teiling, Clotilde; Li, Lewyn; Oskina, Irina N; Kharlamova, Anastasiya V; Gulevich, Rimma G; Padte, Ravee; Dubreuil, Michael M; Vladimirova, Anastasiya V; Shepeleva, Darya V; Shikhevich, Svetlana G; Sun, Qi; Ponnala, Lalit; Temnykh, Svetlana V; Trut, Lyudmila N; Acland, Gregory M

    2011-10-03

    Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p < 0.05) was observed for 335 genes, fewer than 3% of the total number of genes identified in the fox transcriptome. Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information.

  17. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes)

    PubMed Central

    2011-01-01

    Background Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. Results cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p < 0.05) was observed for 335 genes, fewer than 3% of the total number of genes identified in the fox transcriptome. Conclusions Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information. PMID:21967120

  18. Sequencing and de novo assembly of visceral mass transcriptome of the critically endangered land snail Satsuma myomphala: Annotation and SSR discovery.

    PubMed

    Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Park, Seung-Hwan; Park, Young-Su; Han, Yeon Soo; Lee, Jun Sang; Lee, Yong Seok

    2017-03-01

    Satsuma myomphala is critically endangered through loss of natural habitats, predation by natural enemies, and indiscriminate collection. It is a protected species in Korea but lacks genomic resources for an understanding of varied functional processes attributable to evolutionary success under natural habitats. For assessing the genetic information of S. myomphala, we performed for the first time, de novo transcriptome sequencing and functional annotation of expressed sequences using Illumina Next-Generation Sequencing (NGS) platform and bioinformatics analysis. We identified 103,774 unigenes of which 37,959, 12,890, and 17,699 were annotated in the PANM (Protostome DB), Unigene, and COG (Clusters of Orthologous Groups) databases, respectively. In addition, 14,451 unigenes were predicted under Gene Ontology functional categories, with 4581 assigned to a single category. Furthermore, 3369 sequences with 646 having Enzyme Commission (EC) numbers were mapped to 122 pathways in the Kyoto Encyclopedia of Genes and Genomes Pathway database. The prominent protein domains included the Zinc finger (C2H2-like), Reverse Transcriptase, Thioredoxin-like fold, and RNA recognition motif domain. Many unigenes with homology to immunity, defense, and reproduction-related genes were screened in the transcriptome. We also detected 3120 putative simple sequence repeats (SSRs) encompassing dinucleotide to hexanucleotide repeat motifs from >1kb unigene sequences. A list of PCR primers of SSR loci have been identified to study the genetic polymorphisms. The transcriptome data represents a valuable resource for further investigations on the species genome structure and biology. The unigenes information and microsatellites would provide an indispensable tool for conservation of the species in natural and adaptive environments. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris.

    PubMed

    Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth; Welch, Kenneth C; Timp, Winston

    2018-03-01

    Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism.

  20. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling.

    PubMed

    Łabaj, Paweł P; Leparc, Germán G; Linggi, Bryan E; Markillie, Lye Meng; Wiley, H Steven; Kreil, David P

    2011-07-01

    Measurement precision determines the power of any analysis to reliably identify significant signals, such as in screens for differential expression, independent of whether the experimental design incorporates replicates or not. With the compilation of large-scale RNA-Seq datasets with technical replicate samples, however, we can now, for the first time, perform a systematic analysis of the precision of expression level estimates from massively parallel sequencing technology. This then allows considerations for its improvement by computational or experimental means. We report on a comprehensive study of target identification and measurement precision, including their dependence on transcript expression levels, read depth and other parameters. In particular, an impressive recall of 84% of the estimated true transcript population could be achieved with 331 million 50 bp reads, with diminishing returns from longer read lengths and even less gains from increased sequencing depths. Most of the measurement power (75%) is spent on only 7% of the known transcriptome, however, making less strongly expressed transcripts harder to measure. Consequently, <30% of all transcripts could be quantified reliably with a relative error<20%. Based on established tools, we then introduce a new approach for mapping and analysing sequencing reads that yields substantially improved performance in gene expression profiling, increasing the number of transcripts that can reliably be quantified to over 40%. Extrapolations to higher sequencing depths highlight the need for efficient complementary steps. In discussion we outline possible experimental and computational strategies for further improvements in quantification precision. rnaseq10@boku.ac.at

  1. Computational Prediction of miRNA Genes from Small RNA Sequencing Data

    PubMed Central

    Kang, Wenjing; Friedländer, Marc R.

    2015-01-01

    Next-generation sequencing now for the first time allows researchers to gage the depth and variation of entire transcriptomes. However, now as rare transcripts can be detected that are present in cells at single copies, more advanced computational tools are needed to accurately annotate and profile them. microRNAs (miRNAs) are 22 nucleotide small RNAs (sRNAs) that post-transcriptionally reduce the output of protein coding genes. They have established roles in numerous biological processes, including cancers and other diseases. During miRNA biogenesis, the sRNAs are sequentially cleaved from precursor molecules that have a characteristic hairpin RNA structure. The vast majority of new miRNA genes that are discovered are mined from small RNA sequencing (sRNA-seq), which can detect more than a billion RNAs in a single run. However, given that many of the detected RNAs are degradation products from all types of transcripts, the accurate identification of miRNAs remain a non-trivial computational problem. Here, we review the tools available to predict animal miRNAs from sRNA sequencing data. We present tools for generalist and specialist use cases, including prediction from massively pooled data or in species without reference genome. We also present wet-lab methods used to validate predicted miRNAs, and approaches to computationally benchmark prediction accuracy. For each tool, we reference validation experiments and benchmarking efforts. Last, we discuss the future of the field. PMID:25674563

  2. In Silico Comparative Transcriptome Analysis of Two Color Morphs of the Common Coral Trout (Plectropomus Leopardus)

    PubMed Central

    Wang, Le; Yu, Cuiping; Guo, Liang; Lin, Haoran; Meng, Zining

    2015-01-01

    The common coral trout is one species of major importance in commercial fisheries and aquaculture. Recently, two different color morphs of Plectropomus leopardus were discovered and the biological importance of the color difference is unknown. Since coral trout species are poorly characterized at the molecular level, we undertook the transcriptomic characterization of the two color morphs, one black and one red coral trout, using Illumina next generation sequencing technologies. The study produced 55162966 and 54588952 paired-end reads, for black and red trout, respectively. De novo transcriptome assembly generated 95367 and 99424 unique sequences in black and red trout, respectively, with 88813 sequences shared between them. Approximately 50% of both trancriptomes were functionally annotated by BLAST searches against protein databases. The two trancriptomes were enriched into 25 functional categories and showed similar profiles of Gene Ontology category compositions. 34110 unigenes were grouped into 259 KEGG pathways. Moreover, we identified 14649 simple sequence repeats (SSRs) and designed primers for potential application. We also discovered 130524 putative single nucleotide polymorphisms (SNPs) in the two transcriptomes, supplying potential genomic resources for the coral trout species. In addition, we identified 936 fast-evolving genes and 165 candidate genes under positive selection between the two color morphs. Finally, 38 candidate genes underlying the mechanism of color and pigmentation were also isolated. This study presents the first transcriptome resources for the common coral trout and provides basic information for the development of genomic tools for the identification, conservation, and understanding of the speciation and local adaptation of coral reef fish species. PMID:26713756

  3. Reconstructing evolutionary trees in parallel for massive sequences.

    PubMed

    Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

    2017-12-14

    Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .

  4. Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.

    PubMed

    Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin

    2018-02-13

    Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.

  5. Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

    PubMed

    Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

    2018-01-01

    Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.

  6. Transcriptome Wide Annotation of Eukaryotic RNase III Reactivity and Degradation Signals

    PubMed Central

    Gagnon, Jules; Lavoie, Mathieu; Catala, Mathieu; Malenfant, Francis; Elela, Sherif Abou

    2015-01-01

    Detection and validation of the RNA degradation signals controlling transcriptome stability are essential steps for understanding how cells regulate gene expression. Here we present complete genomic and biochemical annotations of the signals required for RNA degradation by the dsRNA specific ribonuclease III (Rnt1p) and examine its impact on transcriptome expression. Rnt1p cleavage signals are randomly distributed in the yeast genome, and encompass a wide variety of sequences, indicating that transcriptome stability is not determined by the recurrence of a fixed cleavage motif. Instead, RNA reactivity is defined by the sequence and structural context in which the cleavage sites are located. Reactive signals are often associated with transiently expressed genes, and their impact on RNA expression is linked to growth conditions. Together, the data suggest that Rnt1p reactivity is triggered by malleable RNA degradation signals that permit dynamic response to changes in growth conditions. PMID:25680180

  7. The de novo Transcriptome and Its Analysis in the Worldwide Vegetable Pest, Delia antiqua (Diptera: Anthomyiidae)

    PubMed Central

    Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin

    2014-01-01

    The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. PMID:24615268

  8. Development of single-copy nuclear intron markers for species-level phylogenetics: Case study with Paullinieae (Sapindaceae).

    PubMed

    Chery, Joyce G; Sass, Chodon; Specht, Chelsea D

    2017-09-01

    We developed a bioinformatic pipeline that leverages a publicly available genome and published transcriptomes to design primers in conserved coding sequences flanking targeted introns of single-copy nuclear loci. Paullinieae (Sapindaceae) is used to demonstrate the pipeline. Transcriptome reads phylogenetically closer to the lineage of interest are aligned to the closest genome. Single-nucleotide polymorphisms are called, generating a "pseudoreference" closer to the lineage of interest. Several filters are applied to meet the criteria of single-copy nuclear loci with introns of a desired size. Primers are designed in conserved coding sequences flanking introns. Using this pipeline, we developed nine single-copy nuclear intron markers for Paullinieae. This pipeline is highly flexible and can be used for any group with available genomic and transcriptomic resources. This pipeline led to the development of nine variable markers for phylogenetic study without generating sequence data de novo.

  9. Transcriptome sequencing and annotation of the halophytic microalga Dunaliella salina * #

    PubMed Central

    Hong, Ling; Liu, Jun-li; Midoun, Samira Z.; Miller, Philip C.

    2017-01-01

    The unicellular green alga Dunaliella salina is well adapted to salt stress and contains compounds (including β-carotene and vitamins) with potential commercial value. A large transcriptome database of D. salina during the adjustment, exponential and stationary growth phases was generated using a high throughput sequencing platform. We characterized the metabolic processes in D. salina with a focus on valuable metabolites, with the aim of manipulating D. salina to achieve greater economic value in large-scale production through a bioengineering strategy. Gene expression profiles under salt stress verified using quantitative polymerase chain reaction (qPCR) implied that salt can regulate the expression of key genes. This study generated a substantial fraction of D. salina transcriptional sequences for the entire growth cycle, providing a basis for the discovery of novel genes. This first full-scale transcriptome study of D. salina establishes a foundation for further comparative genomic studies. PMID:28990374

  10. De novo transcriptome analysis of an imminent biofuel crop, Camelina sativa L. using Illumina GAIIX sequencing platform and identification of SSR markers.

    PubMed

    Mudalkar, Shalini; Golla, Ramesh; Ghatty, Sreenivas; Reddy, Attipalli Ramachandra

    2014-01-01

    Camelina sativa L. is an emerging biofuel crop with potential applications in industry, medicine, cosmetics and human nutrition. The crop is unexploited owing to very limited availability of transcriptome and genomic data. In order to analyse the various metabolic pathways, we performed de novo assembly of the transcriptome on Illumina GAIIX platform with paired end sequencing for obtaining short reads. The sequencing output generated a FastQ file size of 2.97 GB with 10.83 million reads having a maximum read length of 101 nucleotides. The number of contigs generated was 53,854 with maximum and minimum lengths of 10,086 and 200 nucleotides respectively. These trancripts were annotated using BLAST search against the Aracyc, Swiss-Prot, TrEMBL, gene ontology and clusters of orthologous groups (KOG) databases. The genes involved in lipid metabolism were studied and the transcription factors were identified. Sequence similarity studies of Camelina with the other related organisms indicated the close relatedness of Camelina with Arabidopsis. In addition, bioinformatics analysis revealed the presence of a total of 19,379 simple sequence repeats. This is the first report on Camelina sativa L., where the transcriptome of the entire plant, including seedlings, seed, root, leaves and stem was done. Our data established an excellent resource for gene discovery and provide useful information for functional and comparative genomic studies in this promising biofuel crop.

  11. Poly A- transcripts expressed in HeLa cells.

    PubMed

    Wu, Qingfa; Kim, Yeong C; Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q; Wu, Chung-I; Wang, San Ming

    2008-07-30

    Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3' poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3' poly A tail; 3) applying the 454 sequencing technology for massive 3' EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3' ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3' poly A tail. Most of the poly A- 3' ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3' ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts.

  12. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    PubMed

    Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

    2012-01-01

    Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.

  13. De novo Assembly of the Indo-Pacific Humpback Dolphin Leucocyte Transcriptome to Identify Putative Genes Involved in the Aquatic Adaptation and Immune Response

    PubMed Central

    Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

    2013-01-01

    Background The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. Principal Findings We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10−5), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. Conclusion This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers. PMID:24015242

  14. De novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome to identify putative genes involved in the aquatic adaptation and immune response.

    PubMed

    Gui, Duan; Jia, Kuntong; Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

    2013-01-01

    The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10(-5)), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers.

  15. Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

    PubMed Central

    S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali

    2014-01-01

    Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic data is available at NCBI.

  16. De novo Transcriptome Assembly of a Chinese Locoweed (Oxytropis ochrocephala) Species Provides Insights into Genes Associated with Drought, Salinity, and Cold Tolerance

    PubMed Central

    He, Wei; Zhuang, Huihui; Fu, Yanping; Guo, Linwei; Guo, Bin; Guo, Lizhu; Zhang, Xiuhong; Wei, Yahui

    2015-01-01

    Background: Locoweeds (toxic Oxytropis and Astraglus species), containing the toxic agent swainsonine, pose serious threats to animal husbandry on grasslands in both China and the US. Some locoweeds have evolved adaptations in order to resist various stress conditions such as drought, salt and cold. As a result they replace other plants in their communities and become an ecological problem. Currently very limited genetic information of locoweeds is available and this hinders our understanding in the molecular basis of their environmental plasticity, and the interaction between locoweeds and their symbiotic swainsonine producing endophytes. Next-generation sequencing provides a means of obtaining transcriptomic sequences in a timely manner, which is particularly useful for non-model plants. In this study, we performed transcriptome sequencing of Oxytropis ochrocephala plants followed by a de nove assembly. Our primary aim was to provide an enriched pool of genetic sequences of an Oxytropis sp. for further locoweed research. Results: Transcriptomes of four different O. ochrocephala samples, from control (CK) plants, and those that had experienced either drought (20% PEG), salt (150 mM NaCl) or cold (4°C) stress were sequenced using an Illumina Hiseq 2000 platform. From 232,209,506 clean reads 23,220,950,600 (~23 G nucleotides), 182,430 transcripts and 88,942 unigenes were retrieved, with an N50 value of 1237. Differential expression analysis revealed putative genes encoding heat shock proteins (HSPs) and late embryogenesis abundant (LEA) proteins, enzymes in secondary metabolite and plant hormone biosyntheses, and transcription factors which are involved in stress tolerance in O. ochrocephala. In order to validate our sequencing results, we further analyzed the expression profiles of nine genes by quantitative real-time PCR. Finally, we discuss the possible mechanism of O. ochrocephala's adaptations to stress environment. Conclusion: Our transcriptome sequencing data present useful genetic information of a locoweed species. This genetic information will underpin further research in elucidating the environmental acclimation mechanism in locoweeds and the endophyte-plant association. PMID:26697040

  17. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh].

    PubMed

    Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K

    2011-01-20

    Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.

  18. Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

    PubMed Central

    2011-01-01

    Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263

  19. Transcriptome of intraperitoneal organs of starry flounder Platichthys stellatus challenged by Edwardsiella ictaluri JCM1680

    NASA Astrophysics Data System (ADS)

    Tong, Yanli; Sun, Xiuqin; Wang, Bo; Wang, Ling; Li, Yan; Tian, Jinhu; Zheng, Fengrong; Zheng, Minggang

    2015-01-01

    Platichthys stellatus is an economically important marine bony fish species that is cultured in China on a large scale. However, very little is known about its immune-related genes. In this study, the transcriptome of the immune organs of P. stellatus that were intraperitoneally challenged with the pathogen E dwardsiella ictaluri JCM1680 is analyzed. Total RNA from four tissues (spleen, kidney, liver, and intestine) was mixed equally and then sequenced on an Illumina HiSeq 2000 platform. Overall, 28 465 813 quality reads were generated and assembled into 43 061 unigenes. Similarity searches against public protein sequence databases were used to annotate 28 291 unigenes (65.7% of the total), 368 of which were associated with immunoregulation, including 188 related to immunity response. Additionally, the transcript levels of immunity response unigenes annotated as related to tumor necrosis factor (TNF), TNF receptor, chemokine, major histocompatibility complex, and interleukin-6 were investigated in the different tissues of normal and infected P. stellatus by real-time quantitative PCR. The results confirmed that the unigenes identified in the transcriptome database were indeed expressed and up-regulated in infected P. stellatus. To our knowledge, this is the first report of the sequencing and analysis of the transcriptome of P. stellatus. These findings provide insights into the transcriptomics and immunogenetics of bony fish.

  20. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis.« less

  1. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and quantitative real time PCR (qRT-PCR). Conclusions An extensive transcriptome dataset has been obtained from the deep sequencing of tea plant. The coverage of the transcriptome is comprehensive enough to discover all known genes of several major metabolic pathways. This transcriptome dataset can serve as an important public information platform for gene expression, genomics, and functional genomic studies in C. sinensis. PMID:21356090

  2. Predictable transcriptome evolution in the convergent and complex bioluminescent organs of squid

    PubMed Central

    Pankey, M. Sabrina; Minin, Vladimir N.; Imholte, Greg C.; Suchard, Marc A.; Oakley, Todd H.

    2014-01-01

    Despite contingency in life’s history, the similarity of evolutionarily convergent traits may represent predictable solutions to common conditions. However, the extent to which overall gene expression levels (transcriptomes) underlying convergent traits are themselves convergent remains largely unexplored. Here, we show strong statistical support for convergent evolutionary origins and massively parallel evolution of the entire transcriptomes in symbiotic bioluminescent organs (bacterial photophores) from two divergent squid species. The gene expression similarities are so strong that regression models of one species’ photophore can predict organ identity of a distantly related photophore from gene expression levels alone. Our results point to widespread parallel changes in gene expression evolution associated with convergent origins of complex organs. Therefore, predictable solutions may drive not only the evolution of novel, complex organs but also the evolution of overall gene expression levels that underlie them. PMID:25336755

  3. A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics

    PubMed Central

    House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.

    2017-01-01

    Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636

  4. Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.

    PubMed

    Wasik, Kaja; Gurtowski, James; Zhou, Xin; Ramos, Olivia Mendivil; Delás, M Joaquina; Battistoni, Giorgia; El Demerdash, Osama; Falciatori, Ilaria; Vizoso, Dita B; Smith, Andrew D; Ladurner, Peter; Schärer, Lukas; McCombie, W Richard; Hannon, Gregory J; Schatz, Michael

    2015-10-06

    The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50=222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.

  5. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    PubMed

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. A Novel Pathway for Triacylglycerol Biosynthesis Is Responsible for the Accumulation of Massive Quantities of Glycerolipids in the Surface Wax of Bayberry (Myrica pensylvanica) Fruit[OPEN

    PubMed Central

    Ohlrogge, John B.

    2016-01-01

    Bayberry (Myrica pensylvanica) fruits synthesize an extremely thick and unusual layer of crystalline surface wax that accumulates to 32% of fruit dry weight, the highest reported surface lipid accumulation in plants. The composition is also striking, consisting of completely saturated triacylglycerol, diacylglycerol, and monoacylglycerol with palmitate and myristate acyl chains. To gain insight into the unique properties of Bayberry wax synthesis, we examined the chemical and morphological development of the wax layer, monitored wax biosynthesis through [14C]-radiolabeling, and sequenced the transcriptome. Radiolabeling identified sn-2 monoacylglycerol as an initial glycerolipid intermediate. The kinetics of [14C]-DAG and [14C]-TAG accumulation and the regiospecificity of their [14C]-acyl chains indicated distinct pools of acyl donors and that final TAG assembly occurs outside of cells. The most highly expressed lipid-related genes were associated with production of cutin, whereas transcripts for conventional TAG synthesis were >50-fold less abundant. The biochemical and expression data together indicate that Bayberry surface glycerolipids are synthesized by a pathway for TAG synthesis that is related to cutin biosynthesis. The combination of a unique surface wax and massive accumulation may aid understanding of how plants produce and secrete non-membrane glycerolipids and also how to engineer alternative pathways for lipid production in non-seeds. PMID:26744217

  7. Transcriptome sequencing and annotation for the Jamaican fruit bat (Artibeus jamaicensis).

    PubMed

    Shaw, Timothy I; Srivastava, Anuj; Chou, Wen-Chi; Liu, Liang; Hawkinson, Ann; Glenn, Travis C; Adams, Rick; Schountz, Tony

    2012-01-01

    The Jamaican fruit bat (Artibeus jamaicensis) is one of the most common bats in the tropical Americas. It is thought to be a potential reservoir host of Tacaribe virus, an arenavirus closely related to the South American hemorrhagic fever viruses. We performed transcriptome sequencing and annotation from lung, kidney and spleen tissues using 454 and Illumina platforms to develop this species as an animal model. More than 100,000 contigs were assembled, with 25,000 genes that were functionally annotated. Of the remaining unannotated contigs, 80% were found within bat genomes or transcriptomes. Annotated genes are involved in a broad range of activities ranging from cellular metabolism to genome regulation through ncRNAs. Reciprocal BLAST best hits yielded 8,785 sequences that are orthologous to mouse, rat, cattle, horse and human. Species tree analysis of sequences from 2,378 loci was used to achieve 95% bootstrap support for the placement of bat as sister to the clade containing horse, dog, and cattle. Through substitution rate estimation between bat and human, 32 genes were identified with evidence for positive selection. We also identified 466 immune-related genes, which may be useful for studying Tacaribe virus infection of this species. The Jamaican fruit bat transcriptome dataset is a resource that should provide additional candidate markers for studying bat evolution and ecology, and tools for analysis of the host response and pathology of disease.

  8. Profiling the venom gland transcriptomes of Costa Rican snakes by 454 pyrosequencing

    PubMed Central

    2011-01-01

    Background A long term research goal of venomics, of applied importance for improving current antivenom therapy, but also for drug discovery, is to understand the pharmacological potential of venoms. Individually or combined, proteomic and transcriptomic studies have demonstrated their feasibility to explore in depth the molecular diversity of venoms. In the absence of genome sequence, transcriptomes represent also valuable searchable databases for proteomic projects. Results The venom gland transcriptomes of 8 Costa Rican taxa from 5 genera (Crotalus, Bothrops, Atropoides, Cerrophidion, and Bothriechis) of pitvipers were investigated using high-throughput 454 pyrosequencing. 100,394 out of 330,010 masked reads produced significant hits in the available databases. 5.165,220 nucleotides (8.27%) were masked by RepeatMasker, the vast majority of which corresponding to class I (retroelements) and class II (DNA transposons) mobile elements. BLAST hits included 79,991 matches to entries of the taxonomic suborder Serpentes, of which 62,433 displayed similarity to documented venom proteins. Strong discrepancies between the transcriptome-computed and the proteome-gathered toxin compositions were obvious at first sight. Although the reasons underlaying this discrepancy are elusive, since no clear trend within or between species is apparent, the data indicate that individual mRNA species may be translationally controlled in a species-dependent manner. The minimum number of genes from each toxin family transcribed into the venom gland transcriptome of each species was calculated from multiple alignments of reads matched to a full-length reference sequence of each toxin family. Reads encoding ORF regions of Kazal-type inhibitor-like proteins were uniquely found in Bothriechis schlegelii and B. lateralis transcriptomes, suggesting a genus-specific recruitment event during the early-Middle Miocene. A transcriptome-based cladogram supports the large divergence between A. mexicanus and A. picadoi, and a closer kinship between A. mexicanus and C. godmani. Conclusions Our comparative next-generation sequencing (NGS) analysis reveals taxon-specific trends governing the formulation of the venom arsenal. Knowledge of the venom proteome provides hints on the translation efficiency of toxin-coding transcripts, contributing thereby to a more accurate interpretation of the transcriptome. The application of NGS to the analysis of snake venom transcriptomes, may represent the tool for opening the door to systems venomics. PMID:21605378

  9. A comparative view of early development in the corals Favia lizardensis, Ctenactis echinata, and Acropora millepora - morphology, transcriptome, and developmental gene expression.

    PubMed

    Okubo, Nami; Hayward, David C; Forêt, Sylvain; Ball, Eldon E

    2016-02-29

    Research into various aspects of coral biology has greatly increased in recent years due to anthropogenic threats to coral health including pollution, ocean warming and acidification. However, knowledge of coral early development has lagged. The present paper describes the embryonic development of two previously uncharacterized robust corals, Favia lizardensis (a massive brain coral) and Ctenactis echinata (a solitary coral) and compares it to that of the previously characterized complex coral, Acropora millepora, both morphologically and in terms of the expression of a set of key developmental genes. Illumina sequencing of mixed age embryos was carried out, resulting in embryonic transcriptomes consisting of 40605 contigs for C.echinata (N50 = 1080 bp) and 48536 contigs for F.lizardensis (N50 = 1496 bp). The transcriptomes have been annotated against Swiss-Prot and were sufficiently complete to enable the identification of orthologs of many key genes controlling development in bilaterians. Developmental series of images of whole mounts and sections reveal that the early stages of both species contain a blastocoel, consistent with their membership of the robust clade. In situ hybridization was used to examine the expression of the developmentally important genes brachyury, chordin and forkhead. The expression of brachyury and forkhead was consistent with that previously reported for Acropora and allowed us to confirm that the pseudo-blastopore sometimes seen in robust corals such as Favia spp. is not directly associated with gastrulation. C.echinata chordin expression, however, differed from that seen in the other two corals. Embryonic transcriptomes were assembled for the brain coral Favia lizardensis and the solitary coral Ctenactis echinata. Both species have a blastocoel in their early developmental stages, consistent with their phylogenetic position as members of the robust clade. Expression of the key developmental genes brachyury, chordin and forkhead was investigated, allowing comparison to that of their orthologs in Acropora, Nematostella and bilaterians and demonstrating that even within the Anthozoa there are significant differences in expression patterns.

  10. Transcriptome analysis of eyestalk and hemocytes in the ridgetail white prawn Exopalaemon carinicauda: assembly, annotation and marker discovery.

    PubMed

    Li, Jitao; Li, Jian; Chen, Ping; Liu, Ping; He, Yuying

    2015-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of major economic mariculture species in eastern China. The deficiency of genomic and transcriptomic data is becoming the bottleneck of further researches on its good traits. In the present study, 454 pyrosequencing was undertaken to investigate the transcriptome profiles of E. carinicauda. A collection of 1,028,710 sequence reads (459.59 Mb) obtained from cDNA prepared from eyestalk and hemocytes was assembled into 162,056 expressed sequence tags (ESTs). Of these, 29.88 % of 48,428 contigs and 70.12 % of 113,628 singlets possessed high similarities to sequences in the GenBank non-redundant database, with most significant (E value <1e(-10)) unigenes matches occurring with crustacean and insect sequences. KEGG analysis of unigenes identified putative members of biological pathways related to growth and immunity. In addition, we obtained a total of putative 125,112 SNPs and 13,467 microsatellites. These results will contribute to the understanding of the genome makeup and provide useful information for future functional genomic research in E. carinicauda.

  11. Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH.

    PubMed

    Torre, Eduardo; Dueck, Hannah; Shaffer, Sydney; Gospocic, Janko; Gupte, Rohit; Bonasio, Roberto; Kim, Junhyong; Murray, John; Raj, Arjun

    2018-02-28

    Although single-cell RNA sequencing can reliably detect large-scale transcriptional programs, it is unclear whether it accurately captures the behavior of individual genes, especially those that express only in rare cells. Here, we use single-molecule RNA fluorescence in situ hybridization as a gold standard to assess trade-offs in single-cell RNA-sequencing data for detecting rare cell expression variability. We quantified the gene expression distribution for 26 genes that range from ubiquitous to rarely expressed and found that the correspondence between estimates across platforms improved with both transcriptome coverage and increased number of cells analyzed. Further, by characterizing the trade-off between transcriptome coverage and number of cells analyzed, we show that when the number of genes required to answer a given biological question is small, then greater transcriptome coverage is more important than analyzing large numbers of cells. More generally, our report provides guidelines for selecting quality thresholds for single-cell RNA-sequencing experiments aimed at rare cell analyses. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. Identification of Putative Nuclear Receptors and Steroidogenic Enzymes in Murray-Darling Rainbowfish (Melanotaenia fluviatilis) Using RNA-Seq and De Novo Transcriptome Assembly.

    PubMed

    Bain, Peter A; Papanicolaou, Alexie; Kumar, Anupama

    2015-01-01

    Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish.

  13. Amalga-like virus infecting Antonospora locustae, a microsporidian pathogen of grasshoppers, plus related viruses associated with other arthropods.

    PubMed

    Pyle, Jesse D; Keeling, Patrick J; Nibert, Max L

    2017-04-02

    A previously reported Expressed Sequence Tag (EST) library from spores of microsporidian Antonospora locustae includes a number of clones with sequence similarities to plant amalgaviruses. Reexamining the sequence accessions from that library, we found additional such clones, contributing to a 3247-nt contig that approximates the length of an amalga-like virus genome. Using A. locustae spores stored from that previous study, and new ones obtained from the same source, we newly visualized the putative dsRNA genome of this virus and obtained amplicons yielding a 3387-nt complete genome sequence. Phylogenetic analyses suggested it as prototype strain of a new genus in family Amalgaviridae. The genome contains two partially overlapping long ORFs, with downstream ORF2 in the +1 frame relative to ORF1 and a proposed motif for +1 ribosomal frameshifting in the region of overlap. Subsequent database searches using the predicted fusion protein sequence of this new amalga-like virus identified related sequences in the transcriptome of a basal hexapod, the springtail species Tetrodontophora bielanensis. We speculate that this second new amalga-like virus (contig length, 3475 nt) likely also derived from a microsporidian, or related organism, which was associated with the springtail specimens at the time of sampling for transcriptome analysis. Other findings of interest include evidence that the ORF1 translation products of these two new amalga-like viruses contain a central region of predicted α-helical coiled coil, as recently reported for plant amalgaviruses, and transcriptome-based evidence for another new amalga-like virus in the transcriptome of another basal hexapod, the two-pronged bristletail species Campodea augens. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris

    PubMed Central

    Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth

    2018-01-01

    Abstract Background Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. Findings We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. Conclusions We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism. PMID:29618047

  15. A high-throughput venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) and evidence for pervasive positive selection across toxin classes.

    PubMed

    Rokyta, Darin R; Wray, Kenneth P; Lemmon, Alan R; Lemmon, Emily Moriarty; Caudle, S Brian

    2011-04-01

    Despite causing considerable human mortality and morbidity, animal toxins represent a valuable source of pharmacologically active macromolecules, a unique system for studying molecular adaptation, and a powerful framework for examining structure-function relationships in proteins. Snake venoms are particularly useful in the latter regard as they consist primarily of a moderate number of proteins and peptides that have been found to belong to just a handful of protein families. As these proteins and peptides are produced in dedicated glands, transcriptome sequencing has proven to be an effective approach to identifying the expressed toxin genes. We generated a venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) using Roche 454 sequencing technology. In the current work, we focus on transcripts encoding toxins. We identified 40 unique toxin transcripts, 30 of which have full-length coding sequences, and 10 have only partial coding sequences. These toxins account for 24% of the total sequencing reads. We found toxins from 11 previously described families of snake-venom toxins and have discovered two putative, previously undescribed toxin classes. The most diverse and highly expressed toxin classes in the C. adamanteus venom-gland transcriptome are the serine proteinases, metalloproteinases, and C-type lectins. The serine proteinases are the most abundant class, accounting for 35% of the toxin sequencing reads. Metalloproteinases are the most diverse; 11 different forms have been identified. Using our sequences and those available in public databases, we detected positive selection in seven of the eight toxin families for which sufficient sequences were available for the analysis. We find that the vast majority of the genes that contribute directly to this vertebrate trait show evidence for a role for positive selection in their evolutionary history. Copyright © 2011 Elsevier Ltd. All rights reserved.

  16. In Silico Identification of Protein Disulfide Isomerase Gene Families in the De Novo Assembled Transcriptomes of Four Different Species of the Genus Conus.

    PubMed

    Figueroa-Montiel, Andrea; Ramos, Marco A; Mares, Rosa E; Dueñas, Salvador; Pimienta, Genaro; Ortiz, Ernesto; Possani, Lourival D; Licea-Navarro, Alexei F

    2016-01-01

    Small peptides isolated from the venom of the marine snails belonging to the genus Conus have been largely studied because of their therapeutic value. These peptides can be classified in two groups. The largest one is composed by peptides rich in disulfide bonds, and referred to as conotoxins. Despite the importance of conotoxins given their pharmacology value, little is known about the protein disulfide isomerase (PDI) enzymes that are required to catalyze their correct folding. To discover the PDIs that may participate in the folding and structural maturation of conotoxins, the transcriptomes of the venom duct of four different species of Conus from the peninsula of Baja California (Mexico) were assembled. Complementary DNA (cDNA) libraries were constructed for each species and sequenced using a Genome Analyzer Illumina platform. The raw RNA-seq data was converted into transcript sequences using Trinity, a de novo assembler that allows the grouping of reads into contigs without a reference genome. An N50 value of 605 was established as a reference for future assemblies of Conus transcriptomes using this software. Transdecoder was used to extract likely coding sequences from Trinity transcripts, and PDI-specific sequence motif "APWCGHCK" was used to capture potential PDIs. An in silico analysis was performed to characterize the group of PDI protein sequences encoded by the duct-transcriptome of each species. The computational approach entailed a structural homology characterization, based on the presence of functional Thioredoxin-like domains. Four different PDI families were characterized, which are constituted by a total of 41 different gene sequences. The sequences had an average of 65% identity with other PDIs. Using MODELLER 9.14, the homology-based three-dimensional structure prediction of a subset of the sequences reported, showed the expected thioredoxin fold which was confirmed by a "simulated annealing" method.

  17. Comparison of the Nodule vs. Root Transcriptome of the Actinorhizal Plant Datisca glomerata: Actinorhizal Nodules Contain a Specific Class of Defensins

    PubMed Central

    Santos, Patricia; Plaszczyca, Marian; Pawlowski, Katharina

    2013-01-01

    Actinorhizal root nodule symbioses are very diverse, and the symbiosis of Datisca glomerata has previously been shown to have many unusual aspects. In order to gain molecular information on the infection mechanism, nodule development and nodule metabolism, we compared the transcriptomes of D. glomerata roots and nodules. Root and nodule libraries representing the 3′-ends of cDNAs were subjected to high-throughput parallel 454 sequencing. To identify the corresponding genes and to improve the assembly, Illumina sequencing of the nodule transcriptome was performed as well. The evaluation revealed 406 differentially regulated genes, 295 of which (72.7%) could be assigned a function based on homology. Analysis of the nodule transcriptome showed that genes encoding components of the common symbiosis signaling pathway were present in nodules of D. glomerata, which in combination with the previously established function of SymRK in D. glomerata nodulation suggests that this pathway is also active in actinorhizal Cucurbitales. Furthermore, comparison of the D. glomerata nodule transcriptome with nodule transcriptomes from actinorhizal Fagales revealed a new subgroup of nodule-specific defensins that might play a role specific to actinorhizal symbioses. The D. glomerata members of this defensin subgroup contain an acidic C-terminal domain that was never found in plant defensins before. PMID:24009681

  18. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms

    PubMed Central

    2012-01-01

    Background A detailed knowledge about spatial and temporal gene expression is important for understanding both the function of genes and their evolution. For the vast majority of species, transcriptomes are still largely uncharacterized and even in those where substantial information is available it is often in the form of partially sequenced transcriptomes. With the development of next generation sequencing, a single experiment can now simultaneously identify the transcribed part of a species genome and estimate levels of gene expression. Results mRNA from actively growing needles of Norway spruce (Picea abies) was sequenced using next generation sequencing technology. In total, close to 70 million fragments with a length of 76 bp were sequenced resulting in 5 Gbp of raw data. A de novo assembly of these reads, together with publicly available expressed sequence tag (EST) data from Norway spruce, was used to create a reference transcriptome. Of the 38,419 PUTs (putative unique transcripts) longer than 150 bp in this reference assembly, 83.5% show similarity to ESTs from other spruce species and of the remaining PUTs, 3,704 show similarity to protein sequences from other plant species, leaving 4,167 PUTs with limited similarity to currently available plant proteins. By predicting coding frames and comparing not only the Norway spruce PUTs, but also PUTs from the close relatives Picea glauca and Picea sitchensis to both Pinus taeda and Taxus mairei, we obtained estimates of synonymous and non-synonymous divergence among conifer species. In addition, we detected close to 15,000 SNPs of high quality and estimated gene expression differences between samples collected under dark and light conditions. Conclusions Our study yielded a large number of single nucleotide polymorphisms as well as estimates of gene expression on transcriptome scale. In agreement with a recent study we find that the synonymous substitution rate per year (0.6 × 10−09 and 1.1 × 10−09) is an order of magnitude smaller than values reported for angiosperm herbs. However, if one takes generation time into account, most of this difference disappears. The estimates of the dN/dS ratio (non-synonymous over synonymous divergence) reported here are in general much lower than 1 and only a few genes showed a ratio larger than 1. PMID:23122049

  19. Transcriptome landscape of Lactococcus lactis reveals many novel RNAs including a small regulatory RNA involved in carbon uptake and metabolism.

    PubMed

    van der Meulen, Sjoerd B; de Jong, Anne; Kok, Jan

    2016-01-01

    RNA sequencing has revolutionized genome-wide transcriptome analyses, and the identification of non-coding regulatory RNAs in bacteria has thus increased concurrently. Here we reveal the transcriptome map of the lactic acid bacterial paradigm Lactococcus lactis MG1363 by employing differential RNA sequencing (dRNA-seq) and a combination of manual and automated transcriptome mining. This resulted in a high-resolution genome annotation of L. lactis and the identification of 60 cis-encoded antisense RNAs (asRNAs), 186 trans-encoded putative regulatory RNAs (sRNAs) and 134 novel small ORFs. Based on the putative targets of asRNAs, a novel classification is proposed. Several transcription factor DNA binding motifs were identified in the promoter sequences of (a)sRNAs, providing insight in the interplay between lactococcal regulatory RNAs and transcription factors. The presence and lengths of 14 putative sRNAs were experimentally confirmed by differential Northern hybridization, including the abundant RNA 6S that is differentially expressed depending on the available carbon source. For another sRNA, LLMGnc_147, functional analysis revealed that it is involved in carbon uptake and metabolism. L. lactis contains 13% leaderless mRNAs (lmRNAs) that, from an analysis of overrepresentation in GO classes, seem predominantly involved in nucleotide metabolism and DNA/RNA binding. Moreover, an A-rich sequence motif immediately following the start codon was uncovered, which could provide novel insight in the translation of lmRNAs. Altogether, this first experimental genome-wide assessment of the transcriptome landscape of L. lactis and subsequent sRNA studies provide an extensive basis for the investigation of regulatory RNAs in L. lactis and related lactococcal species.

  20. Transcriptome and proteomic analysis of mango (Mangifera indica Linn) fruits.

    PubMed

    Wu, Hong-xia; Jia, Hui-min; Ma, Xiao-wei; Wang, Song-biao; Yao, Quan-sheng; Xu, Wen-tian; Zhou, Yi-gang; Gao, Zhong-shan; Zhan, Ru-lin

    2014-06-13

    Here we used Illumina RNA-seq technology for transcriptome sequencing of a mixed fruit sample from 'Zill' mango (Mangifera indica Linn) fruit pericarp and pulp during the development and ripening stages. RNA-seq generated 68,419,722 sequence reads that were assembled into 54,207 transcripts with a mean length of 858bp, including 26,413 clusters and 27,794 singletons. A total of 42,515(78.43%) transcripts were annotated using public protein databases, with a cut-off E-value above 10(-5), of which 35,198 and 14,619 transcripts were assigned to gene ontology terms and clusters of orthologous groups respectively. Functional annotation against the Kyoto Encyclopedia of Genes and Genomes database identified 23,741(43.79%) transcripts which were mapped to 128 pathways. These pathways revealed many previously unknown transcripts. We also applied mass spectrometry-based transcriptome data to characterize the proteome of ripe fruit. LC-MS/MS analysis of the mango fruit proteome was using tandem mass spectrometry (MS/MS) in an LTQ Orbitrap Velos (Thermo) coupled online to the HPLC. This approach enabled the identification of 7536 peptides that matched 2754 proteins. Our study provides a comprehensive sequence for a systemic view of transcriptome during mango fruit development and the most comprehensive fruit proteome to date, which are useful for further genomics research and proteomic studies. Our study provides a comprehensive sequence for a systemic view of both the transcriptome and proteome of mango fruit, and a valuable reference for further research on gene expression and protein identification. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. De novo Assembly of Leaf Transcriptome in the Medicinal Plant Andrographis paniculata

    PubMed Central

    Cherukupalli, Neeraja; Divate, Mayur; Mittapelli, Suresh R.; Khareedu, Venkateswara R.; Vudem, Dashavantha R.

    2016-01-01

    Andrographis paniculata is an important medicinal plant containing various bioactive terpenoids and flavonoids. Despite its importance in herbal medicine, no ready-to-use transcript sequence information of this plant is made available in the public data base, this study mainly deals with the sequencing of RNA from A. paniculata leaf using Illumina HiSeq™ 2000 platform followed by the de novo transcriptome assembly. A total of 189.22 million high quality paired reads were generated and 1,70,724 transcripts were predicted in the primary assembly. Secondary assembly generated a transcriptome size of ~88 Mb with 83,800 clustered transcripts. Based on the similarity searches against plant non-redundant protein database, gene ontology, and eukaryotic orthologous groups, 49,363 transcripts were annotated constituting upto 58.91% of the identified unigenes. Annotation of transcripts—using kyoto encyclopedia of genes and genomes database—revealed 5606 transcripts plausibly involved in 140 pathways including biosynthesis of terpenoids and other secondary metabolites. Transcription factor analysis showed 6767 unique transcripts belonging to 97 different transcription factor families. A total number of 124 CYP450 transcripts belonging to seven divergent clans have been identified. Transcriptome revealed 146 different transcripts coding for enzymes involved in the biosynthesis of terpenoids of which 35 contained terpene synthase motifs. This study also revealed 32,341 simple sequence repeats (SSRs) in 23,168 transcripts. Assembled sequences of transcriptome of A. paniculata generated in this study are made available, for the first time, in the TSA database, which provides useful information for functional and comparative genomic analysis besides identification of key enzymes involved in the various pathways of secondary metabolism. PMID:27582746

  2. Transcriptomic analysis of Petunia hybrida in response to salt stress using high throughput RNA sequencing.

    PubMed

    Villarino, Gonzalo H; Bombarely, Aureliano; Giovannoni, James J; Scanlon, Michael J; Mattson, Neil S

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments.

  3. Detailed transcriptome description of the neglected cestode Taenia multiceps.

    PubMed

    Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou

    2012-01-01

    The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and parasite-host interaction studies.

  4. A Single Transcriptome of a Green Toad (Bufo viridis) Yields Candidate Genes for Sex Determination and -Differentiation and Non-Anonymous Population Genetic Markers

    PubMed Central

    Gerchen, Jörn F.; Reichert, Samuel J.; Röhr, Johannes T.; Dieterich, Christoph; Kloas, Werner

    2016-01-01

    Large genome size, including immense repetitive and non-coding fractions, still present challenges for capacity, bioinformatics and thus affordability of whole genome sequencing in most amphibians. Here, we test the performance of a single transcriptome to understand whether it can provide a cost-efficient resource for species with large unknown genomes. Using RNA from six different tissues from a single Palearctic green toad (Bufo viridis) specimen and Hiseq2000, we obtained 22,5 Mio reads and publish >100,000 unigene sequences. To evaluate efficacy and quality, we first use this data to identify green toad specific candidate genes, known from other vertebrates for their role in sex determination and differentiation. Of a list of 37 genes, the transcriptome yielded 32 (87%), many of which providing the first such data for this non-model anuran species. However, for many of these genes, only fragments could be retrieved. In order to allow also applications to population genetics, we further used the transcriptome for the targeted development of 21 non-anonymous microsatellites and tested them in genetic families and backcrosses. Eleven markers were specifically developed to be located on the B. viridis sex chromosomes; for eight markers we can indeed demonstrate sex-specific transmission in genetic families. Depending on phylogenetic distance, several markers, which are sex-linked in green toads, show high cross-amplification success across the anuran phylogeny, involving nine systematic anuran families. Our data support the view that single transcriptome sequencing (based on multiple tissues) provides a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and, despite limitations, should be considered as long as genome sequencing remains unaffordable for most species. PMID:27232626

  5. Transcriptomic Analysis of Petunia hybrida in Response to Salt Stress Using High Throughput RNA Sequencing

    PubMed Central

    Villarino, Gonzalo H.; Bombarely, Aureliano; Giovannoni, James J.; Scanlon, Michael J.; Mattson, Neil S.

    2014-01-01

    Salinity and drought stress are the primary cause of crop losses worldwide. In sodic saline soils sodium chloride (NaCl) disrupts normal plant growth and development. The complex interactions of plant systems with abiotic stress have made RNA sequencing a more holistic and appealing approach to study transcriptome level responses in a single cell and/or tissue. In this work, we determined the Petunia transcriptome response to NaCl stress by sequencing leaf samples and assembling 196 million Illumina reads with Trinity software. Using our reference transcriptome we identified more than 7,000 genes that were differentially expressed within 24 h of acute NaCl stress. The proposed transcriptome can also be used as an excellent tool for biological and bioinformatics in the absence of an available Petunia genome and it is available at the SOL Genomics Network (SGN) http://solgenomics.net. Genes related to regulation of reactive oxygen species, transport, and signal transductions as well as novel and undescribed transcripts were among those differentially expressed in response to salt stress. The candidate genes identified in this study can be applied as markers for breeding or to genetically engineer plants to enhance salt tolerance. Gene Ontology analyses indicated that most of the NaCl damage happened at 24 h inducing genotoxicity, affecting transport and organelles due to the high concentration of Na+ ions. Finally, we report a modification to the library preparation protocol whereby cDNA samples were bar-coded with non-HPLC purified primers, without affecting the quality and quantity of the RNA-seq data. The methodological improvement presented here could substantially reduce the cost of sample preparation for future high-throughput RNA sequencing experiments. PMID:24722556

  6. Transcriptomic Analysis of the Salivary Glands of an Invasive Whitefly

    PubMed Central

    Su, Yun-Lin; Li, Jun-Min; Li, Meng; Luan, Jun-Bo; Ye, Xiao-Dong; Wang, Xiao-Wei; Liu, Shu-Sheng

    2012-01-01

    Background Some species of the whitefly Bemisia tabaci complex cause tremendous losses to crops worldwide through feeding directly and virus transmission indirectly. The primary salivary glands of whiteflies are critical for their feeding and virus transmission. However, partly due to their tiny size, research on whitefly salivary glands is limited and our knowledge on these glands is scarce. Methodology/Principal Findings We sequenced the transcriptome of the primary salivary glands of the Mediterranean species of B. tabaci complex using an effective cDNA amplification method in combination with short read sequencing (Illumina). In a single run, we obtained 13,615 unigenes. The quantity of the unigenes obtained from the salivary glands of the whitefly is at least four folds of the salivary gland genes from other plant-sucking insects. To reveal the functions of the primary glands, sequence similarity search and comparisons with the whole transcriptome of the whitefly were performed. The results demonstrated that the genes related to metabolism and transport were significantly enriched in the primary salivary glands. Furthermore, we found that a number of highly expressed genes in the salivary glands might be involved in secretory protein processing, secretion and virus transmission. To identify potential proteins of whitefly saliva, the translated unigenes were put into secretory protein prediction. Finally, 295 genes were predicted to encode secretory proteins and some of them might play important roles in whitefly feeding. Conclusions/Significance: The combined method of cDNA amplification, Illumina sequencing and de novo assembly is suitable for transcriptomic analysis of tiny organs in insects. Through analysis of the transcriptome, genomic features of the primary salivary glands were dissected and biologically important proteins, especially secreted proteins, were predicted. Our findings provide substantial sequence information for the primary salivary glands of whiteflies and will be the basis for future studies on whitefly-plant interactions and virus transmission. PMID:22745728

  7. Girardia dorotocephala transcriptome sequence, assembly, and validation through characterization of piwi homologs and stem cell progeny markers.

    PubMed

    Almazan, Eugene Matthew P; Lesko, Sydney L; Markey, Michael P; Rouhana, Labib

    2018-01-15

    Planarian flatworms are popular models for the study of regeneration and stem cell biology in vivo. Technical advances and increased availability of genetic information have fueled the discovery of molecules responsible for stem cell pluripotency and regeneration in flatworms. Unfortunately, most of the planarian research performed worldwide utilizes species that are not natural habitants of North America, which limits their availability to newcomer laboratories and impedes their distribution for educational activities. In order to circumvent these limitations and increase the genetic information available for comparative studies, we sequenced the transcriptome of Girardia dorotocephala, a planarian species pandemic and commercially available in North America. A total of 254,802,670 paired sequence reads were obtained from RNA extracted from intact individuals, regenerating fragments, as well as freshly excised auricles of a clonal line of G. dorotocephala (MA-C2), and used for de novo assembly of its transcriptome. The resulting transcriptome draft was validated through functional analysis of genetic markers of stem cells and their progeny in G. dorotocephala. Akin to orthologs in other planarian species, G. dorotocephala Piwi1 (GdPiwi1) was found to be a robust marker of the planarian stem cell population and GdPiwi2 an essential component for stem cell-driven regeneration. Identification of G. dorotocephala homologs of the early stem cell descendent marker PROG-1 revealed a family of lysine-rich proteins expressed during epithelial cell differentiation. Sequences from the MA-C2 transcriptome were found to be 98-99% identical to nucleotide sequences from G. dorotocephala populations with different chromosomal number, demonstrating strong conservation regardless of karyotype evolution. Altogether, this work establishes G. dorotocephala as a viable and accessible option for analysis of gene function in North America. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Characterizing differential gene expression in polyploid grasses lacking a reference transcriptome

    USDA-ARS?s Scientific Manuscript database

    Basal transcriptome characterization and differential gene expression in response to varying conditions are often addressed through next generation sequencing (NGS) and data analysis techniques. While these strategies are commonly used, there are countless tools, pipelines, data analysis methods an...

  9. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  10. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  11. Sequencing and Characterization of the Invasive Sycamore Lace Bug Corythucha ciliata (Hemiptera: Tingidae) Transcriptome

    PubMed Central

    Qu, Cheng; Fu, Ningning; Xu, Yihua

    2016-01-01

    The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an invasive forestry pest rapidly expanding in many countries. This pest poses a considerable threat to the urban forestry ecosystem, especially to Platanus spp. However, its molecular biology and biochemistry are poorly understood. This study reports the first C. ciliata transcriptome, encompassing three different life stages (Nymphs, adults female (AF) and adults male (AM)). In total, 26.53 GB of clean data and 60,879 unigenes were obtained from three RNA-seq libraries. These unigenes were annotated and classified by Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), and KO (KEGG Ortholog database). After all pairwise comparisons between these three different samples, a large number of differentially expressed genes were revealed. The dramatic differences in global gene expression profiles were found between distinct life stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM), with some of the significantly differentially expressed genes (DEGs) being related to metamorphosis, digestion, immune and sex difference. The different express of unigenes were validated through quantitative Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition, 17,462 potential simple sequence repeat molecular markers were identified in these transcriptome resources. These comprehensive C. ciliata transcriptomic information can be utilized to promote the development of environmentally friendly methodologies to disrupt the processes of metamorphosis, digestion, immune and sex differences. PMID:27494615

  12. A New Omics Data Resource of Pleurocybella porrigens for Gene Discovery

    PubMed Central

    Dohra, Hideo; Someya, Takumi; Takano, Tomoyuki; Harada, Kiyonori; Omae, Saori; Hirai, Hirofumi; Yano, Kentaro; Kawagishi, Hirokazu

    2013-01-01

    Background Pleurocybella porrigens is a mushroom-forming fungus, which has been consumed as a traditional food in Japan. In 2004, 55 people were poisoned by eating the mushroom and 17 people among them died of acute encephalopathy. Since then, the Japanese government has been alerting Japanese people to take precautions against eating the P . porrigens mushroom. Unfortunately, despite efforts, the molecular mechanism of the encephalopathy remains elusive. The genome and transcriptome sequence data of P . porrigens and the related species, however, are not stored in the public database. To gain the omics data in P . porrigens , we sequenced genome and transcriptome of its fruiting bodies and mycelia by next generation sequencing. Methodology/Principal Findings Short read sequences of genomic DNAs and mRNAs in P . porrigens were generated by Illumina Genome Analyzer. Genome short reads were de novo assembled into scaffolds using Velvet. Comparisons of genome signatures among Agaricales showed that P . porrigens has a unique genome signature. Transcriptome sequences were assembled into contigs (unigenes). Biological functions of unigenes were predicted by Gene Ontology and KEGG pathway analyses. The majority of unigenes would be novel genes without significant counterparts in the public omics databases. Conclusions Functional analyses of unigenes present the existence of numerous novel genes in the basidiomycetes division. The results mean that the omics information such as genome, transcriptome and metabolome in basidiomycetes is short in the current databases. The large-scale omics information on P . porrigens , provided from this research, will give a new data resource for gene discovery in basidiomycetes. PMID:23936076

  13. Genetic Characterization of the Fish Piaractus brachypomus by Microsatellites Derived from Transcriptome Sequencing.

    PubMed

    Jorge, Paulo H; Mastrochirico-Filho, Vito A; Hata, Milene E; Mendes, Natália J; Ariede, Raquel B; de Freitas, Milena Vieira; Vera, Manuel; Porto-Foresti, Fábio; Hashimoto, Diogo T

    2018-01-01

    The pirapitinga, Piaractus brachypomus (Characiformes, Serrasalmidae), is a fish from the Amazon basin and is considered to be one of the main native species used in aquaculture production in South America. The objectives of this study were: (1) to perform liver transcriptome sequencing of pirapitinga through NGS and then validate a set of microsatellite markers for this species; and (2) to use polymorphic microsatellites for analysis of genetic variability in farmed stocks. The transcriptome sequencing was carried out through the Roche/454 technology, which resulted in 3,696 non-redundant contigs. Of this total, 2,568 contigs had similarity in the non-redundant (nr) protein database (Genbank) and 2,075 sequences were characterized in the categories of Gene Ontology (GO). After the validation process of 30 microsatellite loci, eight markers showed polymorphism. The analysis of these polymorphic markers in farmed stocks revealed that fish farms from North Brazil had a higher genetic diversity than fish farms from Southeast Brazil. AMOVA demonstrated that the highest proportion of variation was presented within the populations. However, when comparing different groups (1: Wild; 2: North fish farms; 3: Southeast fish farms), a considerable variation between the groups was observed. The F ST values showed the occurrence of genetic structure among the broodstocks from different regions of Brazil. The transcriptome sequencing in pirapitinga provided important genetic resources for biological studies in this non-model species, and microsatellite data can be used as the framework for the genetic management of breeding stocks in Brazil, which might provide a basis for a genetic pre-breeding programme.

  14. Whole transcriptome analysis of the poultry red mite Dermanyssus gallinae (De Geer, 1778).

    PubMed

    Schicht, Sabine; Qi, Weihong; Poveda, Lucy; Strube, Christina

    2014-03-01

    SUMMARY Although the poultry red mite Dermanyssus gallinae (De Geer, 1778) is the major parasitic pest in poultry farming causing substantial economic losses every year, nucleotide data are rare in the public databases. Therefore, de novo sequencing covering the transcriptome of D. gallinae was carried out resulting in a dataset of 232 097 singletons and 42 130 contiguous sequences (contigs) which were subsequently clustered into 24 140 isogroups consisting of 35 788 isotigs. After removal of sequences possibly originating from bacteria or the chicken host, 267 464 sequences (231 657 singletons, 56 contigs and 35 751 isotigs) remained, of which 10·3% showed homology to proteins derived from other organisms. The most significant Blast top-hit species was the mite Metaseiulus occidentalis followed by the tick Ixodes scapularis. To gain functional knowledge of D. gallinae transcripts, sequences were mapped to Gene Ontology terms, Kyoto Encyclopedia of Gene and Genomes (KEGG) pathways and parsed to InterProScan. The transcriptome dataset provides new insights in general mite genetics and lays a foundation for future studies on stage-specific transcriptomics as well as genomic, proteomic, and metabolomic explorations and might provide new perspectives to control this parasitic mite by identifying possible drug targets or vaccine candidates. It is also worth noting that in different tested species of the class Arachnida no 28S rRNA was detectable in the rRNA profile, indicating that 28S rRNA might consists of two separate, hydrogen-bonded fragments, whose (heat-induced) disruption may led to co-migration with 18S rRNA.

  15. Transcriptomic Modification in the Cerebral Cortex following Noninvasive Brain Stimulation: RNA-Sequencing Approach

    DTIC Science & Technology

    2017-04-20

    was attached to the skull in order to anchor the acrylic and maintain the integrity of the head cap. 2.3. Whole Transcriptome RNA-Sequencing...no. 12, article 550, 2014. [24] D. W. Huang, B. T. Sherman, and R. A. Lempicki, “Systematic and integrative analysis of large gene lists using DAVID...BMC Bioinformatics, vol. 9, article 559, 2008. [29] Z. Hu, E. S. Snitkin, and C. DeLisi, “VisANT: an integrative framework for networks in systems

  16. Comparative transcriptome resources of two Dysosma species (Berberidaceae) and molecular evolution of the CYP719A gene in Podophylloideae.

    PubMed

    Mao, Yunrui; Zhang, Yonghua; Xu, Chuan; Qiu, Yingxiong

    2016-01-01

    Dysosma species (Berberidaceae, Podophylloideae) are of great medicinal pharmacogenetic importance and used as model systems to study the drivers and mechanisms of species diversification of temperate plants in East Asia. Recently, we have sequenced the transcriptome of the low-elevation D. versipellis. In this study, we sequenced the transcriptome of the high-elevation D. aurantiocaulis and used comparative genomic approaches to investigate the transcriptome evolution of the two species. We retrieved 53,929 unigenes from D. aurantiocaulis by de novo transcriptome assemblies using the Illumina HiSeq 2000 platform. Comparing the transcriptomes of both species, we identified 4593 orthologs. Estimation of Ka/Ks ratios for 3126 orthologs revealed that none had a Ka/Ks significantly greater than 1, whereas 1273 (Ka/Ks < 0.5, P < 0.05) were inferred to be under purifying selection. A total of 51 primer pairs were successfully designed from 461 EST-SSRs contained in 4593 orthologs. Marker validation assay revealed that 26 (51%) and 41 (80.4%) produced clear fragments with the expected sizes in all Podophylloideae species. Specifically, 19 different sequences of CYP719A were identified from PCR-amplified genomic DNA of all 12 species of Podophylloideae using primers designed from the assembled transcripts. The data further indicated that CYP719A was likely subject to strong selective constraints maintaining only one copy per genome. In Dysosma, there was relaxed purifying selection or more positive selection for high-elevation species. Overall, this study has generated a wealth of molecular resources potentially useful for pharmacogenetic and evolutionary studies in Dysosma and allied taxa. © 2015 John Wiley & Sons Ltd.

  17. De novo transcriptomic analysis and development of EST-SSR markers in the Siberian tiger (Panthera tigris altaica).

    PubMed

    Lu, Taofeng; Sun, Yujiao; Ma, Qin; Zhu, Minghao; Liu, Dan; Ma, Jianzhang; Ma, Yuehui; Chen, Hongyan; Guan, Weijun

    2016-12-01

    The Siberian tiger, Panthera tigris altaica, is an endangered species, and much more work is needed to protect this species, which is still vulnerable to extinction. Conservation efforts may be supported by the genetic assessment of wild populations, for which highly specific microsatellite markers are required. However, only a limited amount of genetic sequence data is available for this species. To identify the genes involved in the lung transcriptome and to develop additional simple sequence repeat (SSR) markers for the Siberian tiger, we used high-throughput RNA-Seq to characterize the Siberian tiger transcriptome in lung tissue (designated 'PTA-lung') and a pooled tissue sample (designated 'PTA'). Approximately 47.5 % (33,187/69,836) of the lung transcriptome was annotated in four public databases (Nr, Swiss-Prot, KEGG, and COG). The annotated genes formed a potential pool for gene identification in the tiger. An analysis of the genes differentially expressed in the PTA lung, and PTA samples revealed that the tiger may have suffered a series of diseases before death. In total, 1062 non-redundant SSRs were identified in the Siberian tiger transcriptome. Forty-three primer pairs were randomly selected for amplification reactions, and 26 of the 43 pairs were also used to evaluate the levels of genetic polymorphism. Fourteen primer pairs (32.56 %) amplified products that were polymorphic in size in P. tigris altaica. In conclusion, the transcriptome sequences will provide a valuable genomic resource for genetic research, and these new SSR markers comprise a reasonable number of loci for the genetic analysis of wild and captive populations of P. tigris altaica.

  18. Transcriptome analysis and gene expression profiling of abortive and developing ovules during fruit development in hazelnut.

    PubMed

    Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting

    2015-01-01

    A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.

  19. ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome

    PubMed Central

    Carmona, Rosario; Zafra, Adoración; Seoane, Pedro; Castro, Antonio J.; Guerrero-Fernández, Darío; Castillo-Castillo, Trinidad; Medina-García, Ana; Cánovas, Francisco M.; Aldana-Montes, José F.; Navas-Delgado, Ismael; Alché, Juan de Dios; Claros, M. Gonzalo

    2015-01-01

    Plant reproductive transcriptomes have been analyzed in different species due to the agronomical and biotechnological importance of plant reproduction. Here we presented an olive tree reproductive transcriptome database with samples from pollen and pistil at different developmental stages, and leaf and root as control vegetative tissues http://reprolive.eez.csic.es). It was developed from 2,077,309 raw reads to 1,549 Sanger sequences. Using a pre-defined workflow based on open-source tools, sequences were pre-processed, assembled, mapped, and annotated with expression data, descriptions, GO terms, InterPro signatures, EC numbers, KEGG pathways, ORFs, and SSRs. Tentative transcripts (TTs) were also annotated with the corresponding orthologs in Arabidopsis thaliana from TAIR and RefSeq databases to enable Linked Data integration. It results in a reproductive transcriptome comprising 72,846 contigs with average length of 686 bp, of which 63,965 (87.8%) included at least one functional annotation, and 55,356 (75.9%) had an ortholog. A minimum of 23,568 different TTs was identified and 5,835 of them contain a complete ORF. The representative reproductive transcriptome can be reduced to 28,972 TTs for further gene expression studies. Partial transcriptomes from pollen, pistil, and vegetative tissues as control were also constructed. ReprOlive provides free access and download capability to these results. Retrieval mechanisms for sequences and transcript annotations are provided. Graphical localization of annotated enzymes into KEGG pathways is also possible. Finally, ReprOlive has included a semantic conceptualisation by means of a Resource Description Framework (RDF) allowing a Linked Data search for extracting the most updated information related to enzymes, interactions, allergens, structures, and reactive oxygen species. PMID:26322066

  20. Identification of Immune-Related Genes and Development of SSR/SNP Markers from the Spleen Transcriptome of Schizothorax prenanti.

    PubMed

    Luo, Hui; Xiao, Shijun; Ye, Hua; Zhang, Zhengshi; Lv, Changhuan; Zheng, Shuming; Wang, Zhiyong; Wang, Xiaoqing

    2016-01-01

    Schizothorax prenanti (S. prenanti) is mainly distributed in the upstream regions of the Yangtze River and its tributaries in China. This species is indigenous and commercially important. However, in recent years, wild populations and aquacultures have faced the serious challenges of germplasm variation loss and an increased susceptibility to a range of pathogens. Currently, the genetics and immune mechanisms of S. prenanti are unknown, partly due to a lack of genome and transcriptome information. Here, we sought to identify genes related to immune functions and to identify molecular markers to study the function of these genes and for trait mapping. To this end, the transcriptome from spleen tissues of S. prenanti was analyzed and sequenced. Using paired-end reads from the Illumina Hiseq2500 platform, 48,517 transcripts were isolated from the spleen transcriptome. These transcripts could be clustered into 37,785 unigenes with an N50 length of 2,539 bp. The majority of the unigenes (35,653, 94.4%) were successfully annotated using non-redundant nucleotide sequence analysis (nt), and the non-redundant protein (nr), Swiss-Prot, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. KEGG pathway assignment identified more than 500 immune-related genes. Furthermore, 7,545 putative simple sequence repeats (SSRs), 857,535 single nucleotide polymorphisms (SNPs), and 53,481 insertion/deletion (InDels) were detected from the transcriptome. This is the first reported high-throughput transcriptome analysis of S. prenanti, and it provides valuable genetic resources for the investigation of immune mechanisms, conservation of germplasm, and molecular marker-assisted breeding of S. prenanti.

  1. Optimized Probe Masking for Comparative Transcriptomics of Closely Related Species

    PubMed Central

    Poeschl, Yvonne; Delker, Carolin; Trenner, Jana; Ullrich, Kristian Karsten; Quint, Marcel; Grosse, Ivo

    2013-01-01

    Microarrays are commonly applied to study the transcriptome of specific species. However, many available microarrays are restricted to model organisms, and the design of custom microarrays for other species is often not feasible. Hence, transcriptomics approaches of non-model organisms as well as comparative transcriptomics studies among two or more species often make use of cost-intensive RNAseq studies or, alternatively, by hybridizing transcripts of a query species to a microarray of a closely related species. When analyzing these cross-species microarray expression data, differences in the transcriptome of the query species can cause problems, such as the following: (i) lower hybridization accuracy of probes due to mismatches or deletions, (ii) probes binding multiple transcripts of different genes, and (iii) probes binding transcripts of non-orthologous genes. So far, methods for (i) exist, but these neglect (ii) and (iii). Here, we propose an approach for comparative transcriptomics addressing problems (i) to (iii), which retains only transcript-specific probes binding transcripts of orthologous genes. We apply this approach to an Arabidopsis lyrata expression data set measured on a microarray designed for Arabidopsis thaliana, and compare it to two alternative approaches, a sequence-based approach and a genomic DNA hybridization-based approach. We investigate the number of retained probe sets, and we validate the resulting expression responses by qRT-PCR. We find that the proposed approach combines the benefit of sequence-based stringency and accuracy while allowing the expression analysis of much more genes than the alternative sequence-based approach. As an added benefit, the proposed approach requires probes to detect transcripts of orthologous genes only, which provides a superior base for biological interpretation of the measured expression responses. PMID:24260119

  2. Exploring viral infection using single-cell sequencing.

    PubMed

    Rato, Sylvie; Golumbeanu, Monica; Telenti, Amalio; Ciuffi, Angela

    2017-07-15

    Single-cell sequencing (SCS) has emerged as a valuable tool to study cellular heterogeneity in diverse fields, including virology. By studying the viral and cellular genome and/or transcriptome, the dynamics of viral infection can be investigated at single cell level. Most studies have explored the impact of cell-to-cell variation on the viral life cycle from the point of view of the virus, by analyzing viral sequences, and from the point of view of the cell, mainly by analyzing the cellular host transcriptome. In this review, we will focus on recent studies that use single-cell sequencing to explore viral diversity and cell variability in response to viral replication. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  3. An extension of the plant ontology project supporting wood anatomy and development research

    Treesearch

    Federic Lens; Laurel Cooper; Maria Alejandra Gandolfo; Andrew Groover; Pankaj Jaiswal; Barbara Lachenbruch; Rachel Spicer; Margaret E. Staton; Dennis W. Stevenson; Ramona L. Walls; Jill Wegrzyn

    2012-01-01

    A wealth of information on plant anatomy and morphology is available in the current and historical literature, and molecular biologists are producing massive amounts of transcriptome and genome data that can be used to gain better insights into the development, evolution, ecology, and physiological function of plant anatomical attributes. Integrating anatomical and...

  4. Genetic signatures of adaptation revealed from transcriptome sequencing of Arctic and red foxes.

    PubMed

    Kumar, Vikas; Kutschera, Verena E; Nilsson, Maria A; Janke, Axel

    2015-08-07

    The genus Vulpes (true foxes) comprises numerous species that inhabit a wide range of habitats and climatic conditions, including one species, the Arctic fox (Vulpes lagopus) which is adapted to the arctic region. A close relative to the Arctic fox, the red fox (Vulpes vulpes), occurs in subarctic to subtropical habitats. To study the genetic basis of their adaptations to different environments, transcriptome sequences from two Arctic foxes and one red fox individual were generated and analyzed for signatures of positive selection. In addition, the data allowed for a phylogenetic analysis and divergence time estimate between the two fox species. The de novo assembly of reads resulted in more than 160,000 contigs/transcripts per individual. Approximately 17,000 homologous genes were identified using human and the non-redundant databases. Positive selection analyses revealed several genes involved in various metabolic and molecular processes such as energy metabolism, cardiac gene regulation, apoptosis and blood coagulation to be under positive selection in foxes. Branch site tests identified four genes to be under positive selection in the Arctic fox transcriptome, two of which are fat metabolism genes. In the red fox transcriptome eight genes are under positive selection, including molecular process genes, notably genes involved in ATP metabolism. Analysis of the three transcriptomes and five Sanger re-sequenced genes in additional individuals identified a lower genetic variability within Arctic foxes compared to red foxes, which is consistent with distribution range differences and demographic responses to past climatic fluctuations. A phylogenomic analysis estimated that the Arctic and red fox lineages diverged about three million years ago. Transcriptome data are an economic way to generate genomic resources for evolutionary studies. Despite not representing an entire genome, this transcriptome analysis identified numerous genes that are relevant to arctic adaptation in foxes. Similar to polar bears, fat metabolism seems to play a central role in adaptation of Arctic foxes to the cold climate, as has been identified in the polar bear, another arctic specialist.

  5. Genome sequence of a distinct watermelon mosaic virus identified from ginseng (Panax ginseng) transcriptome.

    PubMed

    Park, D; Kim, H; Hahn, Y

    Watermelon mosaic virus (WMV) is a member of the genus Potyvirus, which is the largest genus of plant viruses. WMV is a significant pathogen of crop plants, including Cucurbitaceae species. A WMV strain, designated as WMV-Pg, was identified in transcriptome data collected from ginseng (Panax ginseng) root. WMV-Pg showed 84% nucleotide sequence identity and 91% amino acid sequence identity with its closest related virus, WMV-Fr. A phylogenetic analysis of WMV-Pg with other WMVs and soybean mosaic viruses (SMVs) indicated that WMV-Pg is a distinct subtype of the WMV/SMV group of the genus Potyvirus in the family Potyviridae.

  6. Characterization of the transcriptome of an ecologically important avian species, the Vinous-throated Parrotbill Paradoxornis webbianus bulomachus (Paradoxornithidae; Aves)

    PubMed Central

    2012-01-01

    Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590

  7. Transcriptome analysis of pecan seeds at different developing stages and identification of key genes involved in lipid metabolism

    PubMed Central

    Shah, Faheem Afzal; Wang, Qiaojian; Wang, Zhaocheng; Wu, Lifang

    2018-01-01

    Pecan is an economically important nut crop tree due to its unique texture and flavor properties. The pecan seed is rich of unsaturated fatty acid and protein. However, little is known about the molecular mechanisms of the biosynthesis of fatty acids in the developing seeds. In this study, transcriptome sequencing of the developing seeds was performed using Illumina sequencing technology. Pecan seed embryos at different developmental stages were collected and sequenced. The transcriptomes of pecan seeds at two key developing stages (PA, the initial stage and PS, the fast oil accumulation stage) were also compared. A total of 82,155 unigenes, with an average length of 1,198 bp from seven independent libraries were generated. After functional annotations, we detected approximately 55,854 CDS, among which, 2,807 were Transcription Factor (TF) coding unigenes. Further, there were 13,325 unigenes that showed a 2-fold or greater expression difference between the two groups of libraries (two developmental stages). After transcriptome analysis, we identified abundant unigenes that could be involved in fatty acid biosynthesis, degradation and some other aspects of seed development in pecan. This study presents a comprehensive dataset of transcriptomic changes during the seed development of pecan. It provides insights in understanding the molecular mechanisms responsible for fatty acid biosynthesis in the seed development. The identification of functional genes will also be useful for the molecular breeding work of pecan. PMID:29694395

  8. Transcriptome analysis of pecan seeds at different developing stages and identification of key genes involved in lipid metabolism.

    PubMed

    Xu, Zheng; Ni, Jun; Shah, Faheem Afzal; Wang, Qiaojian; Wang, Zhaocheng; Wu, Lifang; Fu, Songling

    2018-01-01

    Pecan is an economically important nut crop tree due to its unique texture and flavor properties. The pecan seed is rich of unsaturated fatty acid and protein. However, little is known about the molecular mechanisms of the biosynthesis of fatty acids in the developing seeds. In this study, transcriptome sequencing of the developing seeds was performed using Illumina sequencing technology. Pecan seed embryos at different developmental stages were collected and sequenced. The transcriptomes of pecan seeds at two key developing stages (PA, the initial stage and PS, the fast oil accumulation stage) were also compared. A total of 82,155 unigenes, with an average length of 1,198 bp from seven independent libraries were generated. After functional annotations, we detected approximately 55,854 CDS, among which, 2,807 were Transcription Factor (TF) coding unigenes. Further, there were 13,325 unigenes that showed a 2-fold or greater expression difference between the two groups of libraries (two developmental stages). After transcriptome analysis, we identified abundant unigenes that could be involved in fatty acid biosynthesis, degradation and some other aspects of seed development in pecan. This study presents a comprehensive dataset of transcriptomic changes during the seed development of pecan. It provides insights in understanding the molecular mechanisms responsible for fatty acid biosynthesis in the seed development. The identification of functional genes will also be useful for the molecular breeding work of pecan.

  9. Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta

    PubMed Central

    Paviala, Jenni; Morandin, Claire; Wheat, Christopher; Sundström, Liselotte; Helanterä, Heikki

    2017-01-01

    Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon. PMID:29177112

  10. Transcriptome and Expression Profile Analysis of Highly Resistant and Susceptible Banana Roots Challenged with Fusarium oxysporum f. sp. cubense Tropical Race 4

    PubMed Central

    Bai, Ting-Ting; Xie, Wan-Bin; Zhou, Ping-Ping; Wu, Zi-Lin; Xiao, Wen-Chao; Zhou, Ling; Sun, Jie; Ruan, Xiao-Lei; Li, Hua-Ping

    2013-01-01

    Banana wilt disease, caused by the fungal pathogen Fusarium oxysporum f. sp. cubense 4 (Foc4), is regarded as one of the most devastating diseases worldwide. Cavendish cultivar ‘Yueyoukang 1’ was shown to have significantly lower disease severity and incidence compared with susceptible cultivar ‘Brazilian’ in greenhouse and field trials. De novo sequencing technology was previously performed to investigate defense mechanism in middle resistant ‘Nongke No 1’ banana, but not in highly resistant cultivar ‘Yueyoukang 1’. To gain more insights into the resistance mechanism in banana against Foc4, Illumina Solexa sequencing technology was utilized to perform transcriptome sequencing of ‘Yueyoukang 1’ and ‘Brazilian’ and characterize gene expression profile changes in the both two cultivars at days 0.5, 1, 3, 5 and 10 after infection with Foc4. The results showed that more massive transcriptional reprogramming occurs due to Foc4 treatment in ‘Yueyoukang 1’ than ‘Brazilian’, especially at the first three time points, which suggested that ‘Yueyoukang 1’ had much faster defense response against Foc4 infection than ‘Brazilian’. Expression patterns of genes involved in ‘Plant-pathogen interaction’ and ‘Plant hormone signal transduction’ pathways were analyzed and compared between the two cultivars. Defense genes associated with CEBiP, BAK1, NB-LRR proteins, PR proteins, transcription factor and cell wall lignification were expressed stronger in ‘Yueyoukang 1’ than ‘Brazilian’, indicating that these genes play important roles in banana against Foc4 infection. However, genes related to hypersensitive reaction (HR) and senescence were up-regulated in ‘Brazilian’ but down-regulated in ‘Yueyoukang 1’, which suggested that HR and senescence may contribute to Foc4 infection. In addition, the resistance mechanism in highly resistant ‘Yueyoukang 1’ was found to differ from that in middle resistant ‘Nongke No 1’ banana. These results explain the resistance in the highly resistant cultivar and provide more insights in understanding the compatible and incompatible interactions between banana and Foc4. PMID:24086302

  11. An Integrated SNP Mining and Utilization (ISMU) Pipeline for Next Generation Sequencing Data

    PubMed Central

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M.; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A. V. S. K.; Varshney, Rajeev K.

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software. PMID:25003610

  12. Adult Mouse Cortical Cell Taxonomy by Single Cell Transcriptomics

    PubMed Central

    Tasic, Bosiljka; Menon, Vilas; Nguyen, Thuc Nghi; Kim, Tae Kyung; Jarsky, Tim; Yao, Zizhen; Levi, Boaz; Gray, Lucas T.; Sorensen, Staci A.; Dolbeare, Tim; Bertagnolli, Darren; Goldy, Jeff; Shapovalova, Nadiya; Parry, Sheana; Lee, Changkyu; Smith, Kimberly; Bernard, Amy; Madisen, Linda; Sunkin, Susan M.; Hawrylycz, Michael; Koch, Christof; Zeng, Hongkui

    2016-01-01

    Nervous systems are composed of various cell types, but the extent of cell type diversity is poorly understood. Here, we construct a cellular taxonomy of one cortical region, primary visual cortex, in adult mice based on single cell RNA-sequencing. We identify 49 transcriptomic cell types including 23 GABAergic, 19 glutamatergic and seven non-neuronal types. We also analyze cell-type specific mRNA processing and characterize genetic access to these transcriptomic types by many transgenic Cre lines. Finally, we show that some of our transcriptomic cell types display specific and differential electrophysiological and axon projection properties, thereby confirming that the single cell transcriptomic signatures can be associated with specific cellular properties. PMID:26727548

  13. Chloroplast microsatellite markers for Artocarpus (Moraceae) developed from transcriptome sequences

    USDA-ARS?s Scientific Manuscript database

    Premise of the study: Chloroplast microsatellite loci were characterized from transcriptomes of Artocarpus (A.) altilis (breadfruit) and A. camansi (breadnut). They were tested in A. odoratissimus (terap) and A. altilis and evaluated in silico for two congeners. Methods and Results: 15 simple seque...

  14. Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons

    PubMed Central

    Krishnaswami, Suguna Rani; Grindberg, Rashel V; Novotny, Mark; Venepally, Pratap; Lacar, Benjamin; Bhutani, Kunal; Linker, Sara B; Pham, Son; Erwin, Jennifer A; Miller, Jeremy A; Hodge, Rebecca; McCarthy, James K; Kelder, Martin; McCorrison, Jamison; Aevermann, Brian D; Fuertes, Francisco Diez; Scheuermann, Richard H; Lee, Jun; Lein, Ed S; Schork, Nicholas; McConnell, Michael J; Gage, Fred H; Lasken, Roger S

    2016-01-01

    A protocol is described for sequencing the transcriptome of a cell nucleus. Nuclei are isolated from specimens and sorted by FACS, cDNA libraries are constructed and RNA-seq is performed, followed by data analysis. Some steps follow published methods (Smart-seq2 for cDNA synthesis and Nextera XT barcoded library preparation) and are not described in detail here. Previous single-cell approaches for RNA-seq from tissues include cell dissociation using protease treatment at 30 °C, which is known to alter the transcriptome. We isolate nuclei at 4 °C from tissue homogenates, which cause minimal damage. Nuclear transcriptomes can be obtained from postmortem human brain tissue stored at −80 °C, making brain archives accessible for RNA-seq from individual neurons. The method also allows investigation of biological features unique to nuclei, such as enrichment of certain transcripts and precursors of some noncoding RNAs. By following this procedure, it takes about 4 d to construct cDNA libraries that are ready for sequencing. PMID:26890679

  15. Examination of Triacylglycerol Biosynthetic Pathways via De Novo Transcriptomic and Proteomic Analyses in an Unsequenced Microalga

    PubMed Central

    Guarnieri, Michael T.; Nag, Ambarish; Smolinski, Sharon L.; Darzins, Al; Seibert, Michael; Pienkos, Philip T.

    2011-01-01

    Biofuels derived from algal lipids represent an opportunity to dramatically impact the global energy demand for transportation fuels. Systems biology analyses of oleaginous algae could greatly accelerate the commercialization of algal-derived biofuels by elucidating the key components involved in lipid productivity and leading to the initiation of hypothesis-driven strain-improvement strategies. However, higher-level systems biology analyses, such as transcriptomics and proteomics, are highly dependent upon available genomic sequence data, and the lack of these data has hindered the pursuit of such analyses for many oleaginous microalgae. In order to examine the triacylglycerol biosynthetic pathway in the unsequenced oleaginous microalga, Chlorella vulgaris, we have established a strategy with which to bypass the necessity for genomic sequence information by using the transcriptome as a guide. Our results indicate an upregulation of both fatty acid and triacylglycerol biosynthetic machinery under oil-accumulating conditions, and demonstrate the utility of a de novo assembled transcriptome as a search model for proteomic analysis of an unsequenced microalga. PMID:22043295

  16. A reference gene set for sex pheromone biosynthesis and degradation genes from the diamondback moth, Plutella xylostella, based on genome and transcriptome digital gene expression analyses.

    PubMed

    He, Peng; Zhang, Yun-Fei; Hong, Duan-Yang; Wang, Jun; Wang, Xing-Liang; Zuo, Ling-Hua; Tang, Xian-Fu; Xu, Wei-Ming; He, Ming

    2017-03-01

    Female moths synthesize species-specific sex pheromone components and release them to attract male moths, which depend on precise sex pheromone chemosensory system to locate females. Two types of genes involved in the sex pheromone biosynthesis and degradation pathways play essential roles in this important moth behavior. To understand the function of genes in the sex pheromone pathway, this study investigated the genome-wide and digital gene expression of sex pheromone biosynthesis and degradation genes in various adult tissues in the diamondback moth (DBM), Plutella xylostella, which is a notorious vegetable pest worldwide. A massive transcriptome data (at least 39.04 Gb) was generated by sequencing 6 adult tissues including male antennae, female antennae, heads, legs, abdomen and female pheromone glands from DBM by using Illumina 4000 next-generation sequencing and mapping to a published DBM genome. Bioinformatics analysis yielded a total of 89,332 unigenes among which 87 transcripts were putatively related to seven gene families in the sex pheromone biosynthesis pathway. Among these, seven [two desaturases (DES), three fatty acyl-CoA reductases (FAR) one acetyltransferase (ACT) and one alcohol dehydrogenase (AD)] were mainly expressed in the pheromone glands with likely function in the three essential sex pheromone biosynthesis steps: desaturation, reduction, and esterification. We also identified 210 odorant-degradation related genes (including sex pheromone-degradation related genes) from seven major enzyme groups. Among these genes, 100 genes are new identified and two aldehyde oxidases (AOXs), one aldehyde dehydrogenase (ALDH), five carboxyl/cholinesterases (CCEs), five UDP-glycosyltransferases (UGTs), eight cytochrome P450 (CYP) and three glutathione S-transferases (GSTs) displayed more robust expression in the antennae, and thus are proposed to participate in the degradation of sex pheromone components and plant volatiles. To date, this is the most comprehensive gene data set of sex pheromone biosynthesis and degradation enzyme related genes in DBM created by genome- and transcriptome-wide identification, characterization and expression profiling. Our findings provide a basis to better understand the function of genes with tissue enriched expression. The results also provide information on the genes involved in sex pheromone biosynthesis and degradation, and may be useful to identify potential gene targets for pest control strategies by disrupting the insect-insect communication using pheromone-based behavioral antagonists.

  17. Development of EST-SSR markers for Taxillus nigrans (Loranthaceae) in southwestern China using next-generation sequencing1

    PubMed Central

    Miao, Ning; Zhang, Lei; Li, Maoping; Fan, Liqiang; Mao, Kangshan

    2017-01-01

    Premise of the study: We developed transcriptome microsatellite markers (simple sequence repeats) for Taxillus nigrans (Loranthaceae) to survey the genetic diversity and population structure of this species. Methods and Results: We used Illumina HiSeq data to reconstruct the transcriptome of T. nigrans by de novo assembly and used the transcriptome to develop a set of simple sequence repeat markers. Overall, 40 primer pairs were designed and tested; 19 of them amplified successfully and demonstrated polymorphisms. Two loci that detected null alleles were eliminated, and the remaining 17, which were subjected to further analyses, yielded two to 21 alleles per locus. Conclusions: The markers will serve as a basis for studies to assess the extent and pattern of distribution of genetic variation in T. nigrans, and they may also be useful in conservation genetic, ecological, and evolutionary studies of the genus Taxillus, a group of plant species of importance in Chinese traditional medicine. PMID:28924510

  18. The de novo transcriptome and its analysis in the worldwide vegetable pest, Delia antiqua (Diptera: Anthomyiidae).

    PubMed

    Zhang, Yu-Juan; Hao, Youjin; Si, Fengling; Ren, Shuang; Hu, Ganyu; Shen, Li; Chen, Bin

    2014-03-10

    The onion maggot Delia antiqua is a major insect pest of cultivated vegetables, especially the onion, and a good model to investigate the molecular mechanisms of diapause. To better understand the biology and diapause mechanism of the insect pest species, D. antiqua, the transcriptome was sequenced using Illumina paired-end sequencing technology. Approximately 54 million reads were obtained, trimmed, and assembled into 29,659 unigenes, with an average length of 607 bp and an N50 of 818 bp. Among these unigenes, 21,605 (72.8%) were annotated in the public databases. All unigenes were then compared against Drosophila melanogaster and Anopheles gambiae. Codon usage bias was analyzed and 332 simple sequence repeats (SSRs) were detected in this organism. These data represent the most comprehensive transcriptomic resource currently available for D. antiqua and will facilitate the study of genetics, genomics, diapause, and further pest control of D. antiqua. Copyright © 2014 Zhang et al.

  19. Local adaptation of Gymnocypris przewalskii (Cyprinidae) on the Tibetan Plateau

    PubMed Central

    Zhang, Renyi; Ludwig, Arne; Zhang, Cunfang; Tong, Chao; Li, Guogang; Tang, Yongtao; Peng, Zuogang; Zhao, Kai

    2015-01-01

    Divergent selection among environments affects species distributions and can lead to speciation. In this article, we investigated the transcriptomes of two ecotypes of scaleless carp (Gymnocypris przewalskii przewalskii and G. p. ganzihonensis) from the Tibetan Plateau. We used a transcriptome sequencing approach to screen approximately 250,000 expressed sequence tags (ESTs) from the gill and kidney tissues of twelve individuals from the Ganzi River and Lake Qinghai to understand how this freshwater fish has adapted to an ecological niche shift from saline to freshwater. We identified 9,429 loci in the gill transcriptome and 12,034 loci in the kidney transcriptome with significant differences in their expression, of which 242 protein-coding genes exhibited strong positive selection (Ka/Ks > 1). Many of the genes are involved in ion channel functions (e.g., Ca2+-binding proteins), immune responses (e.g., nephrosin) or cellular water absorption functions (e.g., aquaporins). These results have potentially broad importance in understanding shifts from saline to freshwater habitats. Furthermore, this study provides the first transcriptome of G. przewalskii, which will facilitate future ecological genomics studies and aid in the identification of genes underlying adaptation and incipient ecological speciation. PMID:25944748

  20. Transcriptome of the Caribbean stony coral Porites astreoides from three developmental stages.

    PubMed

    Mansour, Tamer A; Rosenthal, Joshua J C; Brown, C Titus; Roberson, Loretta M

    2016-08-02

    Porites astreoides is a ubiquitous species of coral on modern Caribbean reefs that is resistant to increasing temperatures, overfishing, and other anthropogenic impacts that have threatened most other coral species. We assembled and annotated a transcriptome from this coral using Illumina sequences from three different developmental stages collected over several years: free-swimming larvae, newly settled larvae, and adults (>10 cm in diameter). This resource will aid understanding of coral calcification, larval settlement, and host-symbiont interactions. A de novo transcriptome for the P. astreoides holobiont (coral plus algal symbiont) was assembled using 594 Mbp of raw Illumina sequencing data generated from five age-specific cDNA libraries. The new transcriptome consists of 867 255 transcript elements with an average length of 685 bases. The isolated P. astreoides assembly consists of 129 718 transcript elements with an average length of 811 bases, and the isolated Symbiodinium sp. assembly had 186 177 transcript elements with an average length of 1105 bases. This contribution to coral transcriptome data provides a valuable resource for researchers studying the ontogeny of gene expression patterns within both the coral and its dinoflagellate symbiont.

  1. Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi

    PubMed Central

    2013-01-01

    Background Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. Results A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Conclusion Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia. PMID:23777366

  2. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of 'Pre-miRNAs' of black pepper.

    PubMed

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.

  3. Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug Halyomorpha halys.

    PubMed

    Ioannidis, Panagiotis; Lu, Yong; Kumar, Nikhil; Creasy, Todd; Daugherty, Sean; Chibucos, Marcus C; Orvis, Joshua; Shetty, Amol; Ott, Sandra; Flowers, Melissa; Sengamalay, Naomi; Tallon, Luke J; Pick, Leslie; Dunning Hotopp, Julie C

    2014-08-29

    Halyomorpha halys (Stål) (Insecta:Hemiptera;Pentatomidae), commonly known as the Brown Marmorated Stink Bug (BMSB), is an invasive pest of the mid-Atlantic region of the United States, causing economically important damage to a wide range of crops. Native to Asia, BMSB was first observed in Allentown, PA, USA, in 1996, and this pest is now well-established throughout the US mid-Atlantic region and beyond. In addition to the serious threat BMSB poses to agriculture, BMSB has become a nuisance to homeowners, invading home gardens and congregating in large numbers in human-made structures, including homes, to overwinter. Despite its significance as an agricultural pest with limited control options, only 100 bp of BMSB sequence data was available in public databases when this project began. Transcriptome sequencing was undertaken to provide a molecular resource to the research community to inform the development of pest control strategies and to provide molecular data for population genetics studies of BMSB. Using normalized, strand-specific libraries, we sequenced pools of all BMSB life stages on the Illumina HiSeq. Trinity was used to assemble 200,000 putative transcripts in >100,000 components. A novel bioinformatic method that analyzed the strand-specificity of the data reduced this to 53,071 putative transcripts from 18,573 components. By integrating multiple other data types, we narrowed this further to 13,211 representative transcripts. Bacterial endosymbiont genes were identified in this dataset, some of which have a copy number consistent with being lateral gene transfers between endosymbiont genomes and Hemiptera, including ankyrin-repeat related proteins, lysozyme, and mannanase. Such genes and endosymbionts may provide novel targets for BMSB-specific biocontrol. This study demonstrates the utility of strand-specific sequencing in generating shotgun transcriptomes and that rapid sequencing shotgun transcriptomes is possible without the need for extensive inbreeding to generate homozygous lines. Such sequencing can provide a rapid response to pest invasions similar to that already described for disease epidemiology.

  4. De novo Transcriptome Sequencing Reveals a Considerable Bias in the Incidence of Simple Sequence Repeats towards the Downstream of ‘Pre-miRNAs’ of Black Pepper

    PubMed Central

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of ‘43 pre-miRNA candidates bearing different types of SSR motifs’. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted ‘pre-miRNA candidates bearing SSRs’. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted ‘pre-miRNA candidates’. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of ‘tandem repeats’ in miRNAs. PMID:23469176

  5. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways. PMID:15799777

  6. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses.

    PubMed

    Meena, Seema; Kumar, Sarma R; Venkata Rao, D K; Dwivedi, Varun; Shilpashree, H B; Rastogi, Shubhra; Shasany, Ajit K; Nagegowda, Dinesh A

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition.

  7. Workflow and web application for annotating NCBI BioProject transcriptome data

    PubMed Central

    Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A.; Barrero, Luz S.; Landsman, David

    2017-01-01

    Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ PMID:28605765

  8. Comparative transcriptome sequencing and de novo analysis of Vaccinium corymbosum during fruit and color development.

    PubMed

    Li, Lingli; Zhang, Hehua; Liu, Zhongshuai; Cui, Xiaoyue; Zhang, Tong; Li, Yanfang; Zhang, Lingyun

    2016-10-12

    Blueberry is an economically important fruit crop in Ericaceae family. The substantial quantities of flavonoids in blueberry have been implicated in a broad range of health benefits. However, the information regarding fruit development and flavonoid metabolites based on the transcriptome level is still limited. In the present study, the transcriptome and gene expression profiling over berry development, especially during color development were initiated. A total of approximately 13.67 Gbp of data were obtained and assembled into 186,962 transcripts and 80,836 unigenes from three stages of blueberry fruit and color development. A large number of simple sequence repeats (SSRs) and candidate genes, which are potentially involved in plant development, metabolic and hormone pathways, were identified. A total of 6429 sequences containing 8796 SSRs were characterized from 15,457 unigenes and 1763 unigenes contained more than one SSR. The expression profiles of key genes involved in anthocyanin biosynthesis were also studied. In addition, a comparison between our dataset and other published results was carried out. Our high quality reads produced in this study are an important advancement and provide a new resource for the interpretation of high-throughput data for blueberry species whether regarding sequencing data depth or species extension. The use of this transcriptome data will serve as a valuable public information database for the studies of blueberry genome and would greatly boost the research of fruit and color development, flavonoid metabolisms and regulation and breeding of more healthful blueberries.

  9. De Novo Sequencing and Analysis of Lemongrass Transcriptome Provide First Insights into the Essential Oil Biosynthesis of Aromatic Grasses

    PubMed Central

    Meena, Seema; Kumar, Sarma R.; Venkata Rao, D. K.; Dwivedi, Varun; Shilpashree, H. B.; Rastogi, Shubhra; Shasany, Ajit K.; Nagegowda, Dinesh A.

    2016-01-01

    Aromatic grasses of the genus Cymbopogon (Poaceae family) represent unique group of plants that produce diverse composition of monoterpene rich essential oils, which have great value in flavor, fragrance, cosmetic, and aromatherapy industries. Despite the commercial importance of these natural aromatic oils, their biosynthesis at the molecular level remains unexplored. As the first step toward understanding the essential oil biosynthesis, we performed de novo transcriptome assembly and analysis of C. flexuosus (lemongrass) by employing Illumina sequencing. Mining of transcriptome data and subsequent phylogenetic analysis led to identification of terpene synthases, pyrophosphatases, alcohol dehydrogenases, aldo-keto reductases, carotenoid cleavage dioxygenases, alcohol acetyltransferases, and aldehyde dehydrogenases, which are potentially involved in essential oil biosynthesis. Comparative essential oil profiling and mRNA expression analysis in three Cymbopogon species (C. flexuosus, aldehyde type; C. martinii, alcohol type; and C. winterianus, intermediate type) with varying essential oil composition indicated the involvement of identified candidate genes in the formation of alcohols, aldehydes, and acetates. Molecular modeling and docking further supported the role of identified protein sequences in aroma formation in Cymbopogon. Also, simple sequence repeats were found in the transcriptome with many linked to terpene pathway genes including the genes potentially involved in aroma biosynthesis. This work provides the first insights into the essential oil biosynthesis of aromatic grasses, and the identified candidate genes and markers can be a great resource for biotechnological and molecular breeding approaches to modulate the essential oil composition. PMID:27516768

  10. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity

    PubMed Central

    Yassour, Moran; Grabherr, Manfred; Blood, Philip D.; Bowden, Joshua; Couger, Matthew Brian; Eccles, David; Li, Bo; Lieber, Matthias; MacManes, Matthew D.; Ott, Michael; Orvis, Joshua; Pochet, Nathalie; Strozzi, Francesco; Weeks, Nathan; Westerman, Rick; William, Thomas; Dewey, Colin N.; Henschel, Robert; LeDuc, Richard D.; Friedman, Nir; Regev, Aviv

    2013-01-01

    De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net. PMID:23845962

  11. Revealing the transcriptomic complexity of switchgrass by PacBio long-read sequencing.

    PubMed

    Zuo, Chunman; Blow, Matthew; Sreedasyam, Avinash; Kuo, Rita C; Ramamoorthy, Govindarajan Kunde; Torres-Jerez, Ivone; Li, Guifen; Wang, Mei; Dilworth, David; Barry, Kerrie; Udvardi, Michael; Schmutz, Jeremy; Tang, Yuhong; Xu, Ying

    2018-01-01

    Switchgrass ( Panicum virgatum L.) is an important bioenergy crop widely used for lignocellulosic research. While extensive transcriptomic analyses have been conducted on this species using short read-based sequencing techniques, very little has been reliably derived regarding alternatively spliced (AS) transcripts. We present an analysis of transcriptomes of six switchgrass tissue types pooled together, sequenced using Pacific Biosciences (PacBio) single-molecular long-read technology. Our analysis identified 105,419 unique transcripts covering 43,570 known genes and 8795 previously unknown genes. 45,168 are novel transcripts of known genes. A total of 60,096 AS transcripts are identified, 45,628 being novel. We have also predicted 1549 transcripts of genes involved in cell wall construction and remodeling, 639 being novel transcripts of known cell wall genes. Most of the predicted transcripts are validated against Illumina-based short reads. Specifically, 96% of the splice junction sites in all the unique transcripts are validated by at least five Illumina reads. Comparisons between genes derived from our identified transcripts and the current genome annotation revealed that among the gene set predicted by both analyses, 16,640 have different exon-intron structures. Overall, substantial amount of new information is derived from the PacBio RNA data regarding both the transcriptome and the genome of switchgrass.

  12. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa.

    PubMed

    Shahin, Arwa; van Kaauwen, Martijn; Esselink, Danny; Bargsten, Joachim W; van Tuyl, Jaap M; Visser, Richard G F; Arens, Paul

    2012-11-20

    Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies.

  13. Transcriptome survey of the anhydrobiotic tardigrade Milnesium tardigradum in comparison with Hypsibius dujardini and Richtersius coronifer

    PubMed Central

    2010-01-01

    Background The phenomenon of desiccation tolerance, also called anhydrobiosis, involves the ability of an organism to survive the loss of almost all cellular water without sustaining irreversible damage. Although there are several physiological, morphological and ecological studies on tardigrades, only limited DNA sequence information is available. Therefore, we explored the transcriptome in the active and anhydrobiotic state of the tardigrade Milnesium tardigradum which has extraordinary tolerance to desiccation and freezing. In this study, we present the first overview of the transcriptome of M. tardigradum and its response to desiccation and discuss potential parallels to stress responses in other organisms. Results We sequenced a total of 9984 expressed sequence tags (ESTs) from two cDNA libraries from the eutardigrade M. tardigradum in its active and inactive, anhydrobiotic (tun) stage. Assembly of these ESTs resulted in 3283 putative unique transcripts, whereof ~50% showed significant sequence similarity to known genes. The resulting unigenes were functionally annotated using the Gene Ontology (GO) vocabulary. A GO term enrichment analysis revealed several GOs that were significantly underrepresented in the inactive stage. Furthermore we compared the putative unigenes of M. tardigradum with ESTs from two other eutardigrade species that are available from public sequence databases, namely Richtersius coronifer and Hypsibius dujardini. The processed sequences of the three tardigrade species revealed similar functional content and the M. tardigradum dataset contained additional sequences from tardigrades not present in the other two. Conclusions This study describes novel sequence data from the tardigrade M. tardigradum, which significantly contributes to the available tardigrade sequence data and will help to establish this extraordinary tardigrade as a model for studying anhydrobiosis. Functional comparison of active and anhydrobiotic tardigrades revealed a differential distribution of Gene Ontology terms associated with chromatin structure and the translation machinery, which are underrepresented in the inactive animals. These findings imply a widespread metabolic response of the animals on dehydration. The collective tardigrade transcriptome data will serve as a reference for further studies and support the identification and characterization of genes involved in the anhydrobiotic response. PMID:20226016

  14. Transcriptome survey of the anhydrobiotic tardigrade Milnesium tardigradum in comparison with Hypsibius dujardini and Richtersius coronifer.

    PubMed

    Mali, Brahim; Grohme, Markus A; Förster, Frank; Dandekar, Thomas; Schnölzer, Martina; Reuter, Dirk; Wełnicz, Weronika; Schill, Ralph O; Frohme, Marcus

    2010-03-12

    The phenomenon of desiccation tolerance, also called anhydrobiosis, involves the ability of an organism to survive the loss of almost all cellular water without sustaining irreversible damage. Although there are several physiological, morphological and ecological studies on tardigrades, only limited DNA sequence information is available. Therefore, we explored the transcriptome in the active and anhydrobiotic state of the tardigrade Milnesium tardigradum which has extraordinary tolerance to desiccation and freezing. In this study, we present the first overview of the transcriptome of M. tardigradum and its response to desiccation and discuss potential parallels to stress responses in other organisms. We sequenced a total of 9984 expressed sequence tags (ESTs) from two cDNA libraries from the eutardigrade M. tardigradum in its active and inactive, anhydrobiotic (tun) stage. Assembly of these ESTs resulted in 3283 putative unique transcripts, whereof approximately 50% showed significant sequence similarity to known genes. The resulting unigenes were functionally annotated using the Gene Ontology (GO) vocabulary. A GO term enrichment analysis revealed several GOs that were significantly underrepresented in the inactive stage. Furthermore we compared the putative unigenes of M. tardigradum with ESTs from two other eutardigrade species that are available from public sequence databases, namely Richtersius coronifer and Hypsibius dujardini. The processed sequences of the three tardigrade species revealed similar functional content and the M. tardigradum dataset contained additional sequences from tardigrades not present in the other two. This study describes novel sequence data from the tardigrade M. tardigradum, which significantly contributes to the available tardigrade sequence data and will help to establish this extraordinary tardigrade as a model for studying anhydrobiosis. Functional comparison of active and anhydrobiotic tardigrades revealed a differential distribution of Gene Ontology terms associated with chromatin structure and the translation machinery, which are underrepresented in the inactive animals. These findings imply a widespread metabolic response of the animals on dehydration. The collective tardigrade transcriptome data will serve as a reference for further studies and support the identification and characterization of genes involved in the anhydrobiotic response.

  15. Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research

    PubMed Central

    2012-01-01

    Background As a human replacement, the crab-eating macaque (Macaca fascicularis) is an invaluable non-human primate model for biomedical research, but the lack of genetic information on this primate has represented a significant obstacle for its broader use. Results Here, we sequenced the transcriptome of 16 tissues originated from two individuals of crab-eating macaque (male and female), and identified genes to resolve the main obstacles for understanding the biological response of the crab-eating macaque. From 4 million reads with 1.4 billion base sequences, 31,786 isotigs containing genes similar to those of humans, 12,672 novel isotigs, and 348,160 singletons were identified using the GS FLX sequencing method. Approximately 86% of human genes were represented among the genes sequenced in this study. Additionally, 175 tissue-specific transcripts were identified, 81 of which were experimentally validated. In total, 4,314 alternative splicing (AS) events were identified and analyzed. Intriguingly, 10.4% of AS events were associated with transposable element (TE) insertions. Finally, investigation of TE exonization events and evolutionary analysis were conducted, revealing interesting phenomena of human-specific amplified trends in TE exonization events. Conclusions This report represents the first large-scale transcriptome sequencing and genetic analyses of M. fascicularis and could contribute to its utility for biomedical research and basic biology. PMID:22554259

  16. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruggles, Kelly V.; Tang, Zuojian; Wang, Xuya

    Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we therefore describe a proteogenomic data integration tool (QUILTS) and illustrate its application to whole genome, transcriptome and global MS peptide sequence datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS process replicates. Despite over thirty sample replicates, only about 10% of all SNV (somatic andmore » germline) were detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNV without a detectable mRNA transcript were also observed demonstrating the transcriptome coverage was also incomplete (~80%). In contrast to germ-line variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, the QUILTS program integrates DNA, RNA and peptide sequencing to assess the degree to which somatic mutations are translated and therefore biologically active. By identifying gaps in sequence coverage QUILTS benchmarks current technology and assesses progress towards whole cancer proteome and transcriptome analysis.« less

  17. Poly A- Transcripts Expressed in HeLa Cells

    PubMed Central

    Lu, Jian; Xuan, Zhenyu; Chen, Jun; Zheng, Yonglan; Zhou, Tom; Zhang, Michael Q.; Wu, Chung-I; Wang, San Ming

    2008-01-01

    Background Transcripts expressed in eukaryotes are classified as poly A+ transcripts or poly A- transcripts based on the presence or absence of the 3′ poly A tail. Most transcripts identified so far are poly A+ transcripts, whereas the poly A- transcripts remain largely unknown. Methodology/Principal Findings We developed the TRD (Total RNA Detection) system for transcript identification. The system detects the transcripts through the following steps: 1) depleting the abundant ribosomal and small-size transcripts; 2) synthesizing cDNA without regard to the status of the 3′ poly A tail; 3) applying the 454 sequencing technology for massive 3′ EST collection from the cDNA; and 4) determining the genome origins of the detected transcripts by mapping the sequences to the human genome reference sequences. Using this system, we characterized the cytoplasmic transcripts from HeLa cells. Of the 13,467 distinct 3′ ESTs analyzed, 24% are poly A-, 36% are poly A+, and 40% are bimorphic with poly A+ features but without the 3′ poly A tail. Most of the poly A- 3′ ESTs do not match known transcript sequences; they have a similar distribution pattern in the genome as the poly A+ and bimorphic 3′ ESTs, and their mapped intergenic regions are evolutionarily conserved. Experiments confirmed the authenticity of the detected poly A- transcripts. Conclusion/Significance Our study provides the first large-scale sequence evidence for the presence of poly A- transcripts in eukaryotes. The abundance of the poly A- transcripts highlights the need for comprehensive identification of these transcripts for decoding the transcriptome, annotating the genome and studying biological relevance of the poly A- transcripts. PMID:18665230

  18. Transcriptomic analysis of grain amaranth (Amaranthus hypochondriacus) using 454 pyrosequencing: comparison with A. tuberculatus, expression profiling in stems and in response to biotic and abiotic stress

    PubMed Central

    2011-01-01

    Background Amaranthus hypochondriacus, a grain amaranth, is a C4 plant noted by its ability to tolerate stressful conditions and produce highly nutritious seeds. These possess an optimal amino acid balance and constitute a rich source of health-promoting peptides. Although several recent studies, mostly involving subtractive hybridization strategies, have contributed to increase the relatively low number of grain amaranth expressed sequence tags (ESTs), transcriptomic information of this species remains limited, particularly regarding tissue-specific and biotic stress-related genes. Thus, a large scale transcriptome analysis was performed to generate stem- and (a)biotic stress-responsive gene expression profiles in grain amaranth. Results A total of 2,700,168 raw reads were obtained from six 454 pyrosequencing runs, which were assembled into 21,207 high quality sequences (20,408 isotigs + 799 contigs). The average sequence length was 1,064 bp and 930 bp for isotigs and contigs, respectively. Only 5,113 singletons were recovered after quality control. Contigs/isotigs were further incorporated into 15,667 isogroups. All unique sequences were queried against the nr, TAIR, UniRef100, UniRef50 and Amaranthaceae EST databases for annotation. Functional GO annotation was performed with all contigs/isotigs that produced significant hits with the TAIR database. Only 8,260 sequences were found to be homologous when the transcriptomes of A. tuberculatus and A. hypochondriacus were compared, most of which were associated with basic house-keeping processes. Digital expression analysis identified 1,971 differentially expressed genes in response to at least one of four stress treatments tested. These included several multiple-stress-inducible genes that could represent potential candidates for use in the engineering of stress-resistant plants. The transcriptomic data generated from pigmented stems shared similarity with findings reported in developing stems of Arabidopsis and black cottonwood (Populus trichocarpa). Conclusions This study represents the first large-scale transcriptomic analysis of A. hypochondriacus, considered to be a highly nutritious and stress-tolerant crop. Numerous genes were found to be induced in response to (a)biotic stress, many of which could further the understanding of the mechanisms that contribute to multiple stress-resistance in plants, a trait that has potential biotechnological applications in agriculture. PMID:21752295

  19. Transcriptome assembly, profiling and differential gene expression analysis of the halophyte Suaeda fruticosa provides insights into salt tolerance.

    PubMed

    Diray-Arce, Joann; Clement, Mark; Gul, Bilquees; Khan, M Ajmal; Nielsen, Brent L

    2015-05-06

    Improvement of crop production is needed to feed the growing world population as the amount and quality of agricultural land decreases and soil salinity increases. This has stimulated research on salt tolerance in plants. Most crops tolerate a limited amount of salt to survive and produce biomass, while halophytes (salt-tolerant plants) have the ability to grow with saline water utilizing specific biochemical mechanisms. However, little is known about the genes involved in salt tolerance. We have characterized the transcriptome of Suaeda fruticosa, a halophyte that has the ability to sequester salts in its leaves. Suaeda fruticosa is an annual shrub in the family Chenopodiaceae found in coastal and inland regions of Pakistan and Mediterranean shores. This plant is an obligate halophyte that grows optimally from 200-400 mM NaCl and can grow at up to 1000 mM NaCl. High throughput sequencing technology was performed to provide understanding of genes involved in the salt tolerance mechanism. De novo assembly of the transcriptome and analysis has allowed identification of differentially expressed and unique genes present in this non-conventional crop. Twelve sequencing libraries prepared from control (0 mM NaCl treated) and optimum (300 mM NaCl treated) plants were sequenced using Illumina Hiseq 2000 to investigate differential gene expression between shoots and roots of Suaeda fruticosa. The transcriptome was assembled de novo using Velvet and Oases k-45 and clustered using CDHIT-EST. There are 54,526 unigenes; among these 475 genes are downregulated and 44 are upregulated when samples from plants grown under optimal salt are compared with those grown without salt. BLAST analysis identified the differentially expressed genes, which were categorized in gene ontology terms and their pathways. This work has identified potential genes involved in salt tolerance in Suaeda fruticosa, and has provided an outline of tools to use for de novo transcriptome analysis. The assemblies that were used provide coverage of a considerable proportion of the transcriptome, which allows analysis of differential gene expression and identification of genes that may be involved in salt tolerance. The transcriptome may serve as a reference sequence for study of other succulent halophytes.

  20. Characterization of the heart transcriptome of the white shark (Carcharodon carcharias)

    PubMed Central

    2013-01-01

    Background The white shark (Carcharodon carcharias) is a globally distributed, apex predator possessing physical, physiological, and behavioral traits that have garnered it significant public attention. In addition to interest in the genetic basis of its form and function, as a representative of the oldest extant jawed vertebrate lineage, white sharks are also of conservation concern due to their small population size and threat from overfishing. Despite this, surprisingly little is known about the biology of white sharks, and genomic resources are unavailable. To address this deficit, we combined Roche-454 and Illumina sequencing technologies to characterize the first transciptome of any tissue for this species. Results From white shark heart cDNA we generated 665,399 Roche 454 reads (median length 387-bp) that were assembled into 141,626 contigs (mean length 503-bp). We also generated 78,566,588 Illumina reads, which we aligned to the 454 contigs producing 105,014 454/Illumina consensus sequences. To these, we added 3,432 non-singleton 454 contigs. By comparing these sequences to the UniProtKB/Swiss-Prot database we were able to annotate 21,019 translated open reading frames (ORFs) of ≥ 20 amino acids. Of these, 19,277 were additionally assigned Gene Ontology (GO) functional annotations. While acknowledging the limitations of our single tissue transcriptome, Fisher tests showed the white shark transcriptome to be significantly enriched for numerous metabolic GO terms compared to the zebra fish and human transcriptomes, with white shark showing more similarity to human than to zebra fish (i.e. fewer terms were significantly different). We also compared the transcriptome to other available elasmobranch sequences, for signatures of positive selection and identified several genes of putative adaptive significance on the white shark lineage. The white shark transcriptome also contained 8,404 microsatellites (dinucleotide, trinucleotide, or tetranucleotide motifs ≥ five perfect repeats). Detailed characterization of these microsatellites showed that ORFs with trinucleotide repeats, were significantly enriched for transcription regulatory roles and that trinucleotide frequency within ORFs was lower than for a wide range of taxonomic groups including other vertebrates. Conclusion The white shark heart transcriptome represents a valuable resource for future elasmobranch functional and comparative genomic studies, as well as for population and other biological studies vital for effective conservation of this globally vulnerable species. PMID:24112713

  1. Characterization of the heart transcriptome of the white shark (Carcharodon carcharias).

    PubMed

    Richards, Vincent P; Suzuki, Haruo; Stanhope, Michael J; Shivji, Mahmood S

    2013-10-11

    The white shark (Carcharodon carcharias) is a globally distributed, apex predator possessing physical, physiological, and behavioral traits that have garnered it significant public attention. In addition to interest in the genetic basis of its form and function, as a representative of the oldest extant jawed vertebrate lineage, white sharks are also of conservation concern due to their small population size and threat from overfishing. Despite this, surprisingly little is known about the biology of white sharks, and genomic resources are unavailable. To address this deficit, we combined Roche-454 and Illumina sequencing technologies to characterize the first transciptome of any tissue for this species. From white shark heart cDNA we generated 665,399 Roche 454 reads (median length 387-bp) that were assembled into 141,626 contigs (mean length 503-bp). We also generated 78,566,588 Illumina reads, which we aligned to the 454 contigs producing 105,014 454/Illumina consensus sequences. To these, we added 3,432 non-singleton 454 contigs. By comparing these sequences to the UniProtKB/Swiss-Prot database we were able to annotate 21,019 translated open reading frames (ORFs) of ≥ 20 amino acids. Of these, 19,277 were additionally assigned Gene Ontology (GO) functional annotations. While acknowledging the limitations of our single tissue transcriptome, Fisher tests showed the white shark transcriptome to be significantly enriched for numerous metabolic GO terms compared to the zebra fish and human transcriptomes, with white shark showing more similarity to human than to zebra fish (i.e. fewer terms were significantly different). We also compared the transcriptome to other available elasmobranch sequences, for signatures of positive selection and identified several genes of putative adaptive significance on the white shark lineage. The white shark transcriptome also contained 8,404 microsatellites (dinucleotide, trinucleotide, or tetranucleotide motifs ≥ five perfect repeats). Detailed characterization of these microsatellites showed that ORFs with trinucleotide repeats, were significantly enriched for transcription regulatory roles and that trinucleotide frequency within ORFs was lower than for a wide range of taxonomic groups including other vertebrates. The white shark heart transcriptome represents a valuable resource for future elasmobranch functional and comparative genomic studies, as well as for population and other biological studies vital for effective conservation of this globally vulnerable species.

  2. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

    PubMed Central

    2014-01-01

    Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a “non-model system.” PMID:24467778

  3. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

    PubMed

    Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

    2014-01-28

    The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In addition, these data from E. lineata will aid in the interpretation of evolutionary novelties in gene sequence or structure that have been reported for the model cnidarian N. vectensis (e.g., the split NF-κB locus). Finally, we include custom computational tools to facilitate the annotation of a transcriptome based on high-throughput sequencing data obtained from a "non-model system."

  4. Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel.

    PubMed

    Lin, Zixin; An, Jiyong; Wang, Jia; Niu, Jun; Ma, Chao; Wang, Libing; Yuan, Guanshen; Shi, Lingling; Liu, Lili; Zhang, Jinsong; Zhang, Zhixiang; Qi, Ji; Lin, Shanzhi

    2017-01-01

    Lindera glauca fruit with high quality and quantity of oil has emerged as a novel potential source of biodiesel in China, but the molecular regulatory mechanism of carbon flux and energy source for oil biosynthesis in developing fruits is still unknown. To better develop fruit oils of L. glauca as woody biodiesel, a combination of two different sequencing platforms (454 and Illumina) and qRT-PCR analysis was used to define a minimal reference transcriptome of developing L. glauca fruits, and to construct carbon and energy metabolic model for regulation of carbon partitioning and energy supply for FA biosynthesis and oil accumulation. We first analyzed the dynamic patterns of growth tendency, oil content, FA compositions, biodiesel properties, and the contents of ATP and pyridine nucleotide of L. glauca fruits from seven different developing stages. Comprehensive characterization of transcriptome of the developing L. glauca fruit was performed using a combination of two different next-generation sequencing platforms, of which three representative fruit samples (50, 125, and 150 DAF) and one mixed sample from seven developing stages were selected for Illumina and 454 sequencing, respectively. The unigenes separately obtained from long and short reads (201, and 259, respectively, in total) were reconciled using TGICL software, resulting in a total of 60,031 unigenes (mean length = 1061.95 bp) to describe a transcriptome for developing L. glauca fruits. Notably, 198 genes were annotated for photosynthesis, sucrose cleavage, carbon allocation, metabolite transport, acetyl-CoA formation, oil synthesis, and energy metabolism, among which some specific transporters, transcription factors, and enzymes were identified to be implicated in carbon partitioning and energy source for oil synthesis by an integrated analysis of transcriptomic sequencing and qRT-PCR. Importantly, the carbon and energy metabolic model was well established for oil biosynthesis of developing L. glauca fruits, which could help to reveal the molecular regulatory mechanism of the increased oil production in developing fruits. This study presents for the first time the application of an integrated two different sequencing analyses (Illumina and 454) and qRT-PCR detection to define a minimal reference transcriptome for developing L. glauca fruits, and to elucidate the molecular regulatory mechanism of carbon flux control and energy provision for oil synthesis. Our results will provide a valuable resource for future fundamental and applied research on the woody biodiesel plants.

  5. ST Spot Detector: a web-based application for automatic spot and tissue detection for spatial Transcriptomics image datasets.

    PubMed

    Wong, Kim; Navarro, José Fernández; Bergenstråhle, Ludvig; Ståhl, Patrik L; Lundeberg, Joakim

    2018-06-01

    Spatial Transcriptomics (ST) is a method which combines high resolution tissue imaging with high troughput transcriptome sequencing data. This data must be aligned with the images for correct visualization, a process that involves several manual steps. Here we present ST Spot Detector, a web tool that automates and facilitates this alignment through a user friendly interface. jose.fernandez.navarro@scilifelab.se. Supplementary data are available at Bioinformatics online.

  6. CGDV: a webtool for circular visualization of genomics and transcriptomics data.

    PubMed

    Jha, Vineet; Singh, Gulzar; Kumar, Shiva; Sonawane, Amol; Jere, Abhay; Anamika, Krishanpal

    2017-10-24

    Interpretation of large-scale data is very challenging and currently there is scarcity of web tools which support automated visualization of a variety of high throughput genomics and transcriptomics data and for a wide variety of model organisms along with user defined karyotypes. Circular plot provides holistic visualization of high throughput large scale data but it is very complex and challenging to generate as most of the available tools need informatics expertise to install and run them. We have developed CGDV (Circos for Genomics and Transcriptomics Data Visualization), a webtool based on Circos, for seamless and automated visualization of a variety of large scale genomics and transcriptomics data. CGDV takes output of analyzed genomics or transcriptomics data of different formats, such as vcf, bed, xls, tab limited matrix text file, CNVnator raw output and Gene fusion raw output, to plot circular view of the sample data. CGDV take cares of generating intermediate files required for circos. CGDV is freely available at https://cgdv-upload.persistent.co.in/cgdv/ . The circular plot for each data type is tailored to gain best biological insights into the data. The inter-relationship between data points, homologous sequences, genes involved in fusion events, differential expression pattern, sequencing depth, types and size of variations and enrichment of DNA binding proteins can be seen using CGDV. CGDV thus helps biologists and bioinformaticians to visualize a variety of genomics and transcriptomics data seamlessly.

  7. The utility of transcriptomics in fish conservation.

    PubMed

    Connon, Richard E; Jeffries, Ken M; Komoroske, Lisa M; Todgham, Anne E; Fangue, Nann A

    2018-01-29

    There is growing recognition of the need to understand the mechanisms underlying organismal resilience (i.e. tolerance, acclimatization) to environmental change to support the conservation management of sensitive and economically important species. Here, we discuss how functional genomics can be used in conservation biology to provide a cellular-level understanding of organismal responses to environmental conditions. In particular, the integration of transcriptomics with physiological and ecological research is increasingly playing an important role in identifying functional physiological thresholds predictive of compensatory responses and detrimental outcomes, transforming the way we can study issues in conservation biology. Notably, with technological advances in RNA sequencing, transcriptome-wide approaches can now be applied to species where no prior genomic sequence information is available to develop species-specific tools and investigate sublethal impacts that can contribute to population declines over generations and undermine prospects for long-term conservation success. Here, we examine the use of transcriptomics as a means of determining organismal responses to environmental stressors and use key study examples of conservation concern in fishes to highlight the added value of transcriptome-wide data to the identification of functional response pathways. Finally, we discuss the gaps between the core science and policy frameworks and how thresholds identified through transcriptomic evaluations provide evidence that can be more readily used by resource managers. © 2018. Published by The Company of Biologists Ltd.

  8. Fish-T1K (Transcriptomes of 1,000 Fishes) Project: large-scale transcriptome data for fish evolution studies.

    PubMed

    Sun, Ying; Huang, Yu; Li, Xiaofeng; Baldwin, Carole C; Zhou, Zhuocheng; Yan, Zhixiang; Crandall, Keith A; Zhang, Yong; Zhao, Xiaomeng; Wang, Min; Wong, Alex; Fang, Chao; Zhang, Xinhui; Huang, Hai; Lopez, Jose V; Kilfoyle, Kirk; Zhang, Yong; Ortí, Guillermo; Venkatesh, Byrappa; Shi, Qiong

    2016-01-01

    Ray-finned fishes (Actinopterygii) represent more than 50 % of extant vertebrates and are of great evolutionary, ecologic and economic significance, but they are relatively underrepresented in 'omics studies. Increased availability of transcriptome data for these species will allow researchers to better understand changes in gene expression, and to carry out functional analyses. An international project known as the "Transcriptomes of 1,000 Fishes" (Fish-T1K) project has been established to generate RNA-seq transcriptome sequences for 1,000 diverse species of ray-finned fishes. The first phase of this project has produced transcriptomes from more than 180 ray-finned fishes, representing 142 species and covering 51 orders and 109 families. Here we provide an overview of the goals of this project and the work done so far.

  9. Transcriptome Sequence and Plasmid Copy Number Analysis of the Brewery Isolate Pediococcus claussenii ATCC BAA-344T during Growth in Beer

    PubMed Central

    Pittet, Vanessa; Phister, Trevor G.; Ziola, Barry

    2013-01-01

    Growth of specific lactic acid bacteria in beer leads to spoiled product and economic loss for the brewing industry. Microbial growth is typically inhibited by the combined stresses found in beer (e.g., ethanol, hops, low pH, minimal nutrients); however, certain bacteria have adapted to grow in this harsh environment. Considering little is known about the mechanisms used by bacteria to grow in and spoil beer, transcriptome sequencing was performed on a variant of the beer-spoilage organism Pediococcus claussenii ATCC BAA-344T (Pc344-358). Illumina sequencing was used to compare the transcript levels in Pc344-358 growing mid-exponentially in beer to those in nutrient-rich MRS broth. Various operons demonstrated high gene expression in beer, several of which are involved in nutrient acquisition and overcoming the inhibitory effects of hop compounds. As well, genes functioning in cell membrane modification and biosynthesis demonstrated significantly higher transcript levels in Pc344-358 growing in beer. Three plasmids had the majority of their genes showing increased transcript levels in beer, whereas the two cryptic plasmids showed slightly decreased gene expression. Follow-up analysis of plasmid copy number in both growth environments revealed similar trends, where more copies of the three non-cryptic plasmids were found in Pc344-358 growing in beer. Transcriptome sequencing also enabled the addition of several genes to the P . claussenii ATCC BAA-344T genome annotation, some of which are putatively transcribed as non-coding RNAs. The sequencing results not only provide the first transcriptome description of a beer-spoilage organism while growing in beer, but they also highlight several targets for future exploration, including genes that may have a role in the general stress response of lactic acid bacteria. PMID:24040005

  10. Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences

    PubMed Central

    2013-01-01

    Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002

  11. Deep mRNA Sequencing of the Tritonia diomedea Brain Transcriptome Provides Access to Gene Homologues for Neuronal Excitability, Synaptic Transmission and Peptidergic Signalling

    PubMed Central

    Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.

    2015-01-01

    Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197

  12. Analysis of Litopenaeus vannamei Transcriptome Using the Next-Generation DNA Sequencing Technique

    PubMed Central

    Li, Chaozheng; Weng, Shaoping; Chen, Yonggui; Yu, Xiaoqiang; Lü, Ling; Zhang, Haiqing; He, Jianguo; Xu, Xiaopeng

    2012-01-01

    Background Pacific white shrimp (Litopenaeus vannamei), the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. Methodology/Principal Findings This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp) with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG) categories, 8171 unigenes were assigned into 51 Gene ontology (GO) functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. Conclusions/Significance The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei. PMID:23071809

  13. A Deep Insight Into the Sialotranscriptome of the Chagas Disease Vector, Panstrongylus megistus (Hemiptera: Heteroptera)

    PubMed Central

    Ribeiro, José M. C.; Schwarz, Alexandra; Francischetti, Ivo M. B.

    2015-01-01

    Saliva of blood-sucking arthropods contains a complex cocktail of pharmacologically active compounds that assists feeding by counteracting their hosts’ hemostatic and inflammatory reactions. Panstrongylus megistus (Burmeister) is an important vector of Chagas disease in South America, but despite its importance there is only one salivary protein sequence publicly deposited in GenBank. In the present work, we used Illumina technology to disclose and publicly deposit 3,703 coding sequences obtained from the assembly of >70 million reads. These sequences should assist proteomic experiments aimed at identifying pharmacologically active proteins and immunological markers of vector exposure. A supplemental file of the transcriptome and deducted protein sequences can be obtained from http://exon.niaid.nih.gov/transcriptome/P_megistus/Pmeg-web.xlsx. PMID:26334808

  14. Novel Detection of Insecticide Resistance Related P450 Genes and Transcriptome Analysis of the Hemimetabolous Pest Erthesina fullo (Thunberg) (Hemiptera: Heteroptera).

    PubMed

    Liu, Yang; Wu, Haoyang; Xie, Qiang; Bu, Wenjun

    2015-01-01

    Erthesina fullo (Thunberg, 1783) is an economically important heteropteran species in China. Since only three nucleotide sequences of this species (COI, 16S rRNA, and 18S rRNA) appear in the GenBank database so far, no analysis of the molecular mechanisms underlying E. fullo's resistance to insecticide and environmental stress has been accomplished. We reported a de novo assembled and annotated transcriptome for adult E. fullo using the Illumina sequence system. A total of 53,359,458 clean reads of 4.8 billion nucleotides (nt) were assembled into 27,488 unigenes with an average length of 750 bp, of which 17,743 (64.55%) were annotated. In the present study, we identified 88 putative cytochrome P450 sequences and analyzed the evolution of cytochrome P450 superfamilies, genes of the CYP3 clan related to metabolizing xenobiotics and plant natural compounds, in E. fullo, increasing the candidate genes for the molecular mechanisms of insecticide resistance in P450. The sequenced transcriptome greatly expands the available genomic information and could allow a better understanding of the mechanisms of insecticide resistance at the systems biology level.

  15. Intra-isolate genome variation in arbuscular mycorrhizal fungi persists in the transcriptome.

    PubMed

    Boon, E; Zimmerman, E; Lang, B F; Hijri, M

    2010-07-01

    Arbuscular mycorrhizal fungi (AMF) are heterokaryotes with an unusual genetic makeup. Substantial genetic variation occurs among nuclei within a single mycelium or isolate. AMF reproduce through spores that contain varying fractions of this heterogeneous population of nuclei. It is not clear whether this genetic variation on the genome level actually contributes to the AMF phenotype. To investigate the extent to which polymorphisms in nuclear genes are transcribed, we analysed the intra-isolate genomic and cDNA sequence variation of two genes, the large subunit ribosomal RNA (LSU rDNA) of Glomus sp. DAOM-197198 (previously known as G. intraradices) and the POL1-like sequence (PLS) of Glomus etunicatum. For both genes, we find high sequence variation at the genome and transcriptome level. Reconstruction of LSU rDNA secondary structure shows that all variants are functional. Patterns of PLS sequence polymorphism indicate that there is one functional gene copy, PLS2, which is preferentially transcribed, and one gene copy, PLS1, which is a pseudogene. This is the first study that investigates AMF intra-isolate variation at the transcriptome level. In conclusion, it is possible that, in AMF, multiple nuclear genomes contribute to a single phenotype.

  16. Transcriptome Analysis and Development of SSR Molecular Markers in Glycyrrhiza uralensis Fisch.

    PubMed Central

    Liu, Yaling; Zhang, Pengfei; Song, Meiling; Hou, Junling; Qing, Mei; Wang, Wenquan; Liu, Chunsheng

    2015-01-01

    Licorice is an important traditional Chinese medicine with clinical and industrial applications. Genetic resources of licorice are insufficient for analysis of molecular biology and genetic functions; as such, transcriptome sequencing must be conducted for functional characterization and development of molecular markers. In this study, transcriptome sequencing on the Illumina HiSeq 2500 sequencing platform generated a total of 5.41 Gb clean data. De novo assembly yielded a total of 46,641 unigenes. Comparison analysis using BLAST showed that the annotations of 29,614 unigenes were conserved. Further study revealed 773 genes related to biosynthesis of secondary metabolites of licorice, 40 genes involved in biosynthesis of the terpenoid backbone, and 16 genes associated with biosynthesis of glycyrrhizic acid. Analysis of unigenes larger than 1 Kb with a length of 11,702 nt presented 7,032 simple sequence repeats (SSR). Sixty-four of 69 randomly designed and synthesized SSR pairs were successfully amplified, 33 pairs of primers were polymorphism in in Glycyrrhiza uralensis Fisch., Glycyrrhiza inflata Bat., Glycyrrhiza glabra L. and Glycyrrhiza pallidiflora Maxim. This study not only presents the molecular biology data of licorice but also provides a basis for genetic diversity research and molecular marker-assisted breeding of licorice. PMID:26571372

  17. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.

  18. Transcriptome analysis of Pseudomonas syringae identifies new genes, ncRNAs, and antisense activity

    USDA-ARS?s Scientific Manuscript database

    To fully understand how bacteria respond to their environment, it is essential to assess genome-wide transcriptional activity. New high throughput sequencing technologies make it possible to query the transcriptome of an organism in an efficient unbiased manner. We applied a strand-specific method t...

  19. Identification of Genes Related to Learning and Memory in the Brain Transcriptome of the Mollusc, "Hermissenda Crassicornis"

    ERIC Educational Resources Information Center

    Tamvacakis, Arianna N.; Senatore, Adriano; Katz, Paul S.

    2015-01-01

    The sea slug "Hermissenda crassicornis" (Mollusca, Gastropoda, Nudibranchia) has been studied extensively in associative learning paradigms. However, lack of genetic information previously hindered molecular-level investigations. Here, the "Hermissenda" brain transcriptome was sequenced and assembled de novo, producing 165,743…

  20. Transcriptome characterization for genome annotation and functional genomics in Theobroma cacao

    USDA-ARS?s Scientific Manuscript database

    Evidence from leaf transcriptome sequencing using two technology platforms, in combination with protein homology and trained ab initio predictions, previously enabled us to build 35,000 gene models in T. cacao (www.cacaogenomedb.org). Here we review the contribution of each data type to cacao gene a...

  1. Comparative transcriptome analysis of Aspergillus flavus isolates under different oxidative stresses and culture media

    USDA-ARS?s Scientific Manuscript database

    Aspergillus flavus and aflatoxin contamination in the field are known to be influenced by numerous stress factors, particularly drought and heat stress. However, the purpose of aflatoxin production is unknown. Here, we report transcriptome analyses comprised of 282.6 Gb of sequencing data describing...

  2. Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis

    USDA-ARS?s Scientific Manuscript database

    Understanding the molecular and genetic mechanisms underlying variation in seed composition and contents among different genotypes is important for soybean oil quality improvement. We designed a bioinformatics approach to compare seed transcriptomes of 9 soybean genotypes varying in oil composition ...

  3. Generation and analysis of ESTs from strawberry (Fragaria xananassa) fruits and evaluation of their utility in genetic and molecular studies

    PubMed Central

    2010-01-01

    Background Cultivated strawberry is a hybrid octoploid species (Fragaria xananassa Duchesne ex. Rozier) whose fruit is highly appreciated due to its organoleptic properties and health benefits. Despite recent studies on the control of its growth and ripening processes, information about the role played by different hormones on these processes remains elusive. Further advancement of this knowledge is hampered by the limited sequence information on genes from this species, despite the abundant information available on genes from the wild diploid relative Fragaria vesca. However, the diploid species, or one ancestor, only partially contributes to the genome of the cultivated octoploid. We have produced a collection of expressed sequence tags (ESTs) from different cDNA libraries prepared from different fruit parts and developmental stages. The collection has been analysed and the sequence information used to explore the involvement of different hormones in fruit developmental processes, and for the comparison of transcripts in the receptacle of ripe fruits of diploid and octoploid species. The study is particularly important since the commercial fruit is indeed an enlarged flower receptacle with the true fruits, the achenes, on the surface and connected through a network of vascular vessels to the central pith. Results We have sequenced over 4,500 ESTs from Fragaria xananassa, thus doubling the number of ESTs available in the GenBank of this species. We then assembled this information together with that available from F. xananassa resulting a total of 7,096 unigenes. The identification of SSRs and SNPs in many of the ESTs allowed their conversion into functional molecular markers. The availability of libraries prepared from green growing fruits has allowed the cloning of cDNAs encoding for genes of auxin, ethylene and brassinosteroid signalling processes, followed by expression studies in selected fruit parts and developmental stages. In addition, the sequence information generated in the project, jointly with previous information on sequences from both F. xananassa and F. vesca, has allowed designing an oligo-based microarray that has been used to compare the transcriptome of the ripe receptacle of the diploid and octoploid species. Comparison of the transcriptomes, grouping the genes by biological processes, points to differences being quantitative rather than qualitative. Conclusions The present study generates essential knowledge and molecular tools that will be useful in improving investigations at the molecular level in cultivated strawberry (F. xananassa). This knowledge is likely to provide useful resources in the ongoing breeding programs. The sequence information has already allowed the development of molecular markers that have been applied to germplasm characterization and could be eventually used in QTL analysis. Massive transcription analysis can be of utility to target specific genes to be further studied, by their involvement in the different plant developmental processes. PMID:20849591

  4. 454 pyrosequencing based transcriptome analysis of Zygaena filipendulae with focus on genes involved in biosynthesis of cyanogenic glucosides.

    PubMed

    Zagrobelny, Mika; Scheibye-Alsing, Karsten; Jensen, Niels Bjerg; Møller, Birger Lindberg; Gorodkin, Jan; Bak, Søren

    2009-12-02

    An essential driving component in the co-evolution of plants and insects is the ability to produce and handle bioactive compounds. Plants produce bioactive natural products for defense, but some insects detoxify and/or sequester the compounds, opening up for new niches with fewer competitors. To study the molecular mechanism behind the co-adaption in plant-insect interactions, we have investigated the interactions between Lotus corniculatus and Zygaena filipendulae. They both contain cyanogenic glucosides which liberate toxic hydrogen cyanide upon breakdown. Moths belonging to the Zygaena family are the only insects known, able to carry out both de novo biosynthesis and sequestration of the same cyanogenic glucosides as those from their feed plants. The biosynthetic pathway for cyanogenic glucoside biosynthesis in Z. filipendulae proceeds using the same intermediates as in the well known pathway from plants, but none of the enzymes responsible have been identified. A genomics strategy founded on 454 pyrosequencing of the Z. filipendulae transcriptome was undertaken to identify some of these enzymes in Z. filipendulae. Comparisons of the Z. filipendulae transcriptome with the sequenced genomes of Bombyx mori, Drosophila melanogaster, Tribolium castaneum, Apis mellifera and Anopheles gambiae indicate a high coverage of the Z. filipendulae transcriptome. 11% of the Z. filipendulae transcriptome sequences were assigned to Gene Ontology categories. Candidate genes for enzymes functioning in the biosynthesis of cyanogenic glucosides (cytochrome P450 and family 1 glycosyltransferases) were identified based on sequence length, number of copies and presence/absence of close homologs in D. melanogaster, B. mori and the cyanogenic butterfly Heliconius. Examination of biased codon usage, GC content and selection on gene candidates support the notion of cyanogenesis as an "old" trait within Ditrysia, as well as its origins being convergent between plants and insects. Pyrosequencing is an attractive approach to gain access to genes in the biosynthesis of bio-active natural products from insects and other organisms, for which the genome sequence is not known. Based on analysis of the Z. filipendulae transcriptome, promising gene candidates for biosynthesis of cyanogenic glucosides was identified, and the suitability of Z. filipendulae as a model system for cyanogenesis in insects is evident.

  5. The technology and biology of single-cell RNA sequencing.

    PubMed

    Kolodziejczyk, Aleksandra A; Kim, Jong Kyoung; Svensson, Valentine; Marioni, John C; Teichmann, Sarah A

    2015-05-21

    The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Fragmentation of whole-transcriptome RNA using E. coli RNase III.

    PubMed

    Ares, Manuel

    2013-05-01

    High-throughput sequencing (HTS) methods can provide short sequence reads for many millions of individual molecules in a sample, allowing the use of sequencing to measure the abundance of RNA molecules. To quantify the amount of a particular sequence in a sample of large RNAs (e.g., mRNAs), it is important to fragment the RNA into short pieces that can be ligated to oligonucleotides that allow polymerase chain reaction (PCR) amplification and sequencing. The most desired end structure of RNA for such ligation steps is a 5' phosphate and a 3' OH. Thus, enzymes that leave these groups after cleavage are of particular utility, avoiding the need to dephosphorylate the 3' end with phosphatases or phosphorylate the 5' end with kinase before proceeding. One such enzyme, RNase III, is widely available. Although it primarily cuts duplex RNA, this specificity is salt- and concentration-dependent, and many RNAs that lack strong extended duplexes are nonetheless susceptible to cleavage at many spots. RNA fragmentation by RNase III does not seem to grossly affect the distribution of RNA sequencing reads. Thus, it has become a standard method for creating nominally representative pools of transcriptome sequences with 5' phosphates and 3' OH for library construction. Three steps in preparing fragmented transcriptome RNA for sequencing library construction are described here: (1) fragmenting the RNA with RNase III to the extent that ~60-100-nucleotide fragments are created, (2) purifying the RNA from the RNase III reaction, and (3) analyzing the digestion products for their suitability in library production.

  7. Biosynthesis of the active compounds of Isatis indigotica based on transcriptome sequencing and metabolites profiling

    PubMed Central

    2013-01-01

    Backgroud Isatis indigotica is a widely used herb for the clinical treatment of colds, fever, and influenza in Traditional Chinese Medicine (TCM). Various structural classes of compounds have been identified as effective ingredients. However, little is known at genetics level about these active metabolites. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive dataset of I. indigotica. Results A database of 36,367 unigenes (average length = 1,115.67 bases) was generated by performing transcriptome sequencing. Based on the gene annotation of the transcriptome, 104 unigenes were identified covering most of the catalytic steps in the general biosynthetic pathways of indole, terpenoid, and phenylpropanoid. Subsequently, the organ-specific expression patterns of the genes involved in these pathways, and their responses to methyl jasmonate (MeJA) induction, were investigated. Metabolites profile of effective phenylpropanoid showed accumulation pattern of secondary metabolites were mostly correlated with the transcription of their biosynthetic genes. According to the analysis of UDP-dependent glycosyltransferases (UGT) family, several flavonoids were indicated to exist in I. indigotica and further identified by metabolic profile using UPLC/Q-TOF. Moreover, applying transcriptome co-expression analysis, nine new, putative UGTs were suggested as flavonol glycosyltransferases and lignan glycosyltransferases. Conclusions This database provides a pool of candidate genes involved in biosynthesis of effective metabolites in I. indigotica. Furthermore, the comprehensive analysis and characterization of the significant pathways are expected to give a better insight regarding the diversity of chemical composition, synthetic characteristics, and the regulatory mechanism which operate in this medical herb. PMID:24308360

  8. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

    PubMed

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-06-15

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.

  9. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

    PubMed Central

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-01-01

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961

  10. Re-evaluating microglia expression profiles using RiboTag and cell isolation strategies.

    PubMed

    Haimon, Zhana; Volaski, Alon; Orthgiess, Johannes; Boura-Halfon, Sigalit; Varol, Diana; Shemer, Anat; Yona, Simon; Zuckerman, Binyamin; David, Eyal; Chappell-Maor, Louise; Bechmann, Ingo; Gericke, Martin; Ulitsky, Igor; Jung, Steffen

    2018-06-01

    Transcriptome profiling is widely used to infer functional states of specific cell types, as well as their responses to stimuli, to define contributions to physiology and pathophysiology. Focusing on microglia, the brain's macrophages, we report here a side-by-side comparison of classical cell-sorting-based transcriptome sequencing and the 'RiboTag' method, which avoids cell retrieval from tissue context and yields translatome sequencing information. Conventional whole-cell microglial transcriptomes were found to be significantly tainted by artifacts introduced by tissue dissociation, cargo contamination and transcripts sequestered from ribosomes. Conversely, our data highlight the added value of RiboTag profiling for assessing the lineage accuracy of Cre recombinase expression in transgenic mice. Collectively, this study indicates method-based biases, reveals observer effects and establishes RiboTag-based translatome profiling as a valuable complement to standard sorting-based profiling strategies.

  11. Comparative Genomic and Transcriptomic Characterization of the Toxigenic Marine Dinoflagellate Alexandrium ostenfeldii

    PubMed Central

    Jaeckisch, Nina; Yang, Ines; Wohlrab, Sylke; Glöckner, Gernot; Kroymann, Juergen; Vogel, Heiko; Cembella, Allan; John, Uwe

    2011-01-01

    Many dinoflagellate species are notorious for the toxins they produce and ecological and human health consequences associated with harmful algal blooms (HABs). Dinoflagellates are particularly refractory to genomic analysis due to the enormous genome size, lack of knowledge about their DNA composition and structure, and peculiarities of gene regulation, such as spliced leader (SL) trans-splicing and mRNA transposition mechanisms. Alexandrium ostenfeldii is known to produce macrocyclic imine toxins, described as spirolides. We characterized the genome of A. ostenfeldii using a combination of transcriptomic data and random genomic clones for comparison with other dinoflagellates, particularly Alexandrium species. Examination of SL sequences revealed similar features as in other dinoflagellates, including Alexandrium species. SL sequences in decay indicate frequent retro-transposition of mRNA species. This probably contributes to overall genome complexity by generating additional gene copies. Sequencing of several thousand fosmid and bacterial artificial chromosome (BAC) ends yielded a wealth of simple repeats and tandemly repeated longer sequence stretches which we estimated to comprise more than half of the whole genome. Surprisingly, the repeats comprise a very limited set of 79–97 bp sequences; in part the genome is thus a relatively uniform sequence space interrupted by coding sequences. Our genomic sequence survey (GSS) represents the largest genomic data set of a dinoflagellate to date. Alexandrium ostenfeldii is a typical dinoflagellate with respect to its transcriptome and mRNA transposition but demonstrates Alexandrium-like stop codon usage. The large portion of repetitive sequences and the organization within the genome is in agreement with several other studies on dinoflagellates using different approaches. It remains to be determined whether this unusual composition is directly correlated to the exceptionally genome organization of dinoflagellates with a low amount of histones and histone-like proteins. PMID:22164224

  12. Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

    PubMed

    Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

    2014-01-01

    Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5)), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus.

  13. RNA sequencing analysis to capture the transcriptome landscape during skin ulceration syndrome progression in sea cucumber Apostichopus japonicus.

    PubMed

    Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong

    2016-06-14

    Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important in clarifying the molecular mechanisms associated with the development of SUS in sea cucumber.

  14. Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

    PubMed

    Davidson, Nadia M; Oshlack, Alicia

    2018-05-01

    RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.

  15. Deep sequencing-based transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus reveals insight into the immune-relevant genes in marine fish

    PubMed Central

    2010-01-01

    Background Systematic research on fish immunogenetics is indispensable in understanding the origin and evolution of immune systems. This has long been a challenging task because of the limited number of deep sequencing technologies and genome backgrounds of non-model fish available. The newly developed Solexa/Illumina RNA-seq and Digital gene expression (DGE) are high-throughput sequencing approaches and are powerful tools for genomic studies at the transcriptome level. This study reports the transcriptome profiling analysis of bacteria-challenged Lateolabrax japonicus using RNA-seq and DGE in an attempt to gain insights into the immunogenetics of marine fish. Results RNA-seq analysis generated 169,950 non-redundant consensus sequences, among which 48,987 functional transcripts with complete or various length encoding regions were identified. More than 52% of these transcripts are possibly involved in approximately 219 known metabolic or signalling pathways, while 2,673 transcripts were associated with immune-relevant genes. In addition, approximately 8% of the transcripts appeared to be fish-specific genes that have never been described before. DGE analysis revealed that the host transcriptome profile of Vibrio harveyi-challenged L. japonicus is considerably altered, as indicated by the significant up- or down-regulation of 1,224 strong infection-responsive transcripts. Results indicated an overall conservation of the components and transcriptome alterations underlying innate and adaptive immunity in fish and other vertebrate models. Analysis suggested the acquisition of numerous fish-specific immune system components during early vertebrate evolution. Conclusion This study provided a global survey of host defence gene activities against bacterial challenge in a non-model marine fish. Results can contribute to the in-depth study of candidate genes in marine fish immunity, and help improve current understanding of host-pathogen interactions and evolutionary history of immunogenetics from fish to mammals. PMID:20707909

  16. Comparing de novo assemblers for 454 transcriptome data

    PubMed Central

    2010-01-01

    Background Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode Litomosoides sigmodontis. Results Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs. Conclusions Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended. PMID:20950480

  17. Detailed Transcriptome Description of the Neglected Cestode Taenia multiceps

    PubMed Central

    Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou

    2012-01-01

    Background The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. Methodology/Principal Findings We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. Conclusions/Significance This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and parasite-host interaction studies. PMID:23049872

  18. Increasing the reach of forensic genetics with massively parallel sequencing.

    PubMed

    Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

    2017-09-01

    The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.

  19. Wolf-Rayet stars as starting points or as endpoints of the evolution of massive stars?

    NASA Technical Reports Server (NTRS)

    Lamers, H. J. G. L. M.; Maeder, A.; Schmutz, W.; Cassinelli, J. P.

    1991-01-01

    The paper investigates the evidence for the two interpretations of Wolf-Rayet stars suggested in the literature: (1) massive premain-sequence stars with disks and (2) massive stars which have lost most of their H-rich layers in a stellar wind is investigated. The abundance determinations which are done in two different ways and which lead to different conclusions are discussed. The composition is solar, which would suggest interpretation (1), or the CNO abundances are strongly anomalous, which would suggest interpretation (2). Results from evolutionary calculations, stellar statistics, the existence of Ofpe/WN9 transition stars and W-R stars with evolved companions show overwhelming evidence that W-R stars are not premain-sequence stars but that they are in a late stage of evolution. Moreover, the fact that W-R stars are usually in clear regions of space, whereas massive premain-sequence stars are embedded in ultracompact H II regions also shows that W-R stars are not young premain-sequence stars.

  20. Deep sequencing of the Mexican avocado transcriptome, an ancient angiosperm with a high content of fatty acids.

    PubMed

    Ibarra-Laclette, Enrique; Méndez-Bravo, Alfonso; Pérez-Torres, Claudia Anahí; Albert, Victor A; Mockaitis, Keithanne; Kilaru, Aruna; López-Gómez, Rodolfo; Cervantes-Luevano, Jacob Israel; Herrera-Estrella, Luis

    2015-08-13

    Avocado (Persea americana) is an economically important tropical fruit considered to be a good source of fatty acids. Despite its importance, the molecular and cellular characterization of biochemical and developmental processes in avocado is limited due to the lack of transcriptome and genomic information. The transcriptomes of seeds, roots, stems, leaves, aerial buds and flowers were determined using different sequencing platforms. Additionally, the transcriptomes of three different stages of fruit ripening (pre-climacteric, climacteric and post-climacteric) were also analyzed. The analysis of the RNAseqatlas presented here reveals strong differences in gene expression patterns between different organs, especially between root and flower, but also reveals similarities among the gene expression patterns in other organs, such as stem, leaves and aerial buds (vegetative organs) or seed and fruit (storage organs). Important regulators, functional categories, and differentially expressed genes involved in avocado fruit ripening were identified. Additionally, to demonstrate the utility of the avocado gene expression atlas, we investigated the expression patterns of genes implicated in fatty acid metabolism and fruit ripening. A description of transcriptomic changes occurring during fruit ripening was obtained in Mexican avocado, contributing to a dynamic view of the expression patterns of genes involved in fatty acid biosynthesis and the fruit ripening process.

  1. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana.

    PubMed

    Gross, Stephen M; Martin, Jeffrey A; Simpson, June; Abraham-Juarez, María Jazmín; Wang, Zhong; Visel, Axel

    2013-08-19

    Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development.

  2. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana

    PubMed Central

    2013-01-01

    Background Agaves are succulent monocotyledonous plants native to xeric environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis), and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Results Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, built from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having a minimum of approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, a focus on the transcriptomics of the A. deserti juvenile leaf confirms evolutionary conservation of monocotyledonous leaf physiology and development along the proximal-distal axis. Conclusions Our work presents a comprehensive transcriptome resource for two Agave species and provides insight into their biology and physiology. These resources are a foundation for further investigation of agave biology and their improvement for bioenergy development. PMID:23957668

  3. A transcriptome resource for the Antarctic pteropod Limacina helicina antarctica.

    PubMed

    Johnson, Kevin M; Hofmann, Gretchen E

    2016-08-01

    The pteropod Limacina helicina antarctica is a dominant member of the zooplankton assemblage in the Antarctic marine ecosystem, and is part of a relatively simple food web in nearshore marine Antarctic waters. As a shelled pteropod, Limacina has been suggested as a candidate sentinel organism for the impacts of ocean acidification, due to the potential for shell dissolution in undersaturated waters. In this study, our goal was to develop a transcriptomic resource for Limacina that would support mechanistic studies to explore the physiological response of Limacina to abiotic stressors such as ocean acidification and ocean warming. To this end, RNA sequencing libraries were prepared from Limacina that had been exposed to a range of pH levels and an elevated temperature to maximize the diversity of expressed genes. RNA sequencing (RNA-seq) was conducted on an Illumina NextSeq500 which produced 339,000,000 150bp paired-end reads. The de novo transcriptome was produced using Trinity and annotation of the assembled transcriptome resulted in the identification of 81,229 transcripts in 137 KEGG pathways. This RNA-seq effort resulted in a transcriptome for the Antarctic pteropod, Limacina helicina antarctica, that is a major resource for an international marine science research community studying these pelagic molluscs in a global change context. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Novel Insights into the Transcriptome of Dirofilaria immitis

    PubMed Central

    Zhang, Zhihe; Hou, Rong; Wu, Xuhang; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Yang, Zhi; Wang, Chengdong; Luo, Li; Liu, Li; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou

    2012-01-01

    Background The heartworm Dirofilaria immitis is the causal agent of cardiopulmonary dirofilariosis in dogs and cats, and also infects a wide range of wild mammals as well as humans. One bottleneck for the design of fundamentally new intervention and management strategies against D. immitis may be the currently limited knowledge of fundamental molecular aspects of D. immitis. Methodology/Principal Findings A next-generation sequencing platform combining computational approaches was employed to assess a global view of the heartworm transcriptome. A total of 20,810 unigenes (mean length  = 1,270 bp) were assembled from 22.3 million clean reads. From these, 15,698 coding sequences (CDS) were inferred, and about 85% of the unigenes had orthologs/homologs in public databases. Comparative transcriptomic study uncovered 4,157 filarial-specific genes as well as 3,795 genes potentially involved in filarial-Wolbachia symbiosis. In addition, the potential intestine transcriptome of D. immitis (1,101 genes) was mined for the first time, which might help to discover ‘hidden antigens’. Conclusions/Significance This study provides novel insights into the transcriptome of D. immitis and sheds light on its molecular processes and survival mechanisms. Furthermore, it provides a platform to discover new vaccine candidates and potential targets for new drugs against dirofilariosis. PMID:22911833

  5. Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia

    PubMed Central

    Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A.; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A.; Fagiolini, Michela; Hensch, Takao K.; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide

    2003-01-01

    We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete. PMID:12819125

  6. Integrated whole-genome and transcriptome sequence analysis reveals the genetic characteristics of a riboflavin-overproducing Bacillus subtilis.

    PubMed

    Wang, Guanglu; Shi, Ting; Chen, Tao; Wang, Xiaoyue; Wang, Yongcheng; Liu, Dingyu; Guo, Jiaxin; Fu, Jing; Feng, Lili; Wang, Zhiwen; Zhao, Xueming

    2018-06-02

    Commercial riboflavin production with Bacillus subtilis has been developed by combining rational and classical strain development for almost two decades, but how an improved riboflavin producer can be created rationally is still not completely understood. In this study, we demonstrate the combined use of integrated genomic and transcriptomic analysis of the genetic basis for riboflavin over-production in B. subtilis. This methodology succeeded in discerning the positive mutations in the mutagenesis derived riboflavin producer B. subtilis 24/pMX45 through whole-genome sequencing and transcriptome sequencing. These included RibC (G199D), ribD + (G+39A), PurA (P242L), CcpN(A44S), YvrH (R222Q) and two nonsense mutations YhcF (R90*) and YwaA (Q68*). Reintroducing these specific mutations into the wild-type strain recovered the riboflavin overproduction phenotype and subsequent metabolic engineering greatly improved riboflavin production, achieving an up to 3.4-fold increase of the riboflavin titer over the sequenced producer. A novel mutation, YvrH (R222Q), involved in a typical two-component regulatory system deregulated the purine de novo synthesis pathway and increased the pool of intracellular purine metabolites, which in turn increased riboflavin production. Taken together, we present a case study of combining genome and transcriptome analysis to elucidate the genetic underpinnings of a complex cellular property, which enabled the transfer of beneficial mutations to engineer a reference strain into an overproducer. Copyright © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.

  7. A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly

    PubMed Central

    2013-01-01

    Background The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. Results We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. Conclusions These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies. PMID:23496952

  8. A comparison across non-model animals suggests an optimal sequencing depth for de novo transcriptome assembly.

    PubMed

    Francis, Warren R; Christianson, Lynne M; Kiko, Rainer; Powers, Meghan L; Shaner, Nathan C; Haddock, Steven H D

    2013-03-12

    The lack of genomic resources can present challenges for studies of non-model organisms. Transcriptome sequencing offers an attractive method to gather information about genes and gene expression without the need for a reference genome. However, it is unclear what sequencing depth is adequate to assemble the transcriptome de novo for these purposes. We assembled transcriptomes of animals from six different phyla (Annelids, Arthropods, Chordates, Cnidarians, Ctenophores, and Molluscs) at regular increments of reads using Velvet/Oases and Trinity to determine how read count affects the assembly. This included an assembly of mouse heart reads because we could compare those against the reference genome that is available. We found qualitative differences in the assemblies of whole-animals versus tissues. With increasing reads, whole-animal assemblies show rapid increase of transcripts and discovery of conserved genes, while single-tissue assemblies show a slower discovery of conserved genes though the assembled transcripts were often longer. A deeper examination of the mouse assemblies shows that with more reads, assembly errors become more frequent but such errors can be mitigated with more stringent assembly parameters. These assembly trends suggest that representative assemblies are generated with as few as 20 million reads for tissue samples and 30 million reads for whole-animals for RNA-level coverage. These depths provide a good balance between coverage and noise. Beyond 60 million reads, the discovery of new genes is low and sequencing errors of highly-expressed genes are likely to accumulate. Finally, siphonophores (polymorphic Cnidarians) are an exception and possibly require alternate assembly strategies.

  9. Workflow and web application for annotating NCBI BioProject transcriptome data.

    PubMed

    Vera Alvarez, Roberto; Medeiros Vidal, Newton; Garzón-Martínez, Gina A; Barrero, Luz S; Landsman, David; Mariño-Ramírez, Leonardo

    2017-01-01

    The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. URL: http://www.ncbi.nlm.nih.gov/projects/physalis/. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  10. RNA-seq analysis and de novo transcriptome assembly of Jerusalem artichoke (Helianthus tuberosus Linne).

    PubMed

    Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun

    2014-01-01

    Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.

  11. Assembled contigs of the Haller's organ transcriptome from an Australian population of the cattle tick, Rhipicephalus microplus

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources and University of Texas at El Paso researchers, we sequenced and assembled the transcriptome of the Haller's organ of an Australian strain (NRFS) of the cattle tick Rhipicephalus microplus (recently reclassified as Rhipicephalus australis...

  12. RNASeq-based genome annotation and identification of long-noncoding RNAs in the grapevine cultivar 'Riesling'

    USDA-ARS?s Scientific Manuscript database

    The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in heterozygous species. This is a promising approach to improving the annotation of the reference genome sequence of grapevine (Vitis vinifera L.), a species of high-l...

  13. Transcriptomic analysis reveals numerous diverse protein kinases and transcription factors involved in desiccation tolerance in the resurrection plant Myrothamnus flabellifolia

    USDA-ARS?s Scientific Manuscript database

    The woody resurrection plant Myrothamnus flabellifolia has remarkable tolerance to desiccation. Pyro-sequencing technology permitted us to analyze the transcriptome of M. flabellifolia during both dehydration and rehydration. We identified a total of 8287 and 8542 differentially transcribed genes du...

  14. DNA sequences of Pima (Gossypium barbadense L.) cotton leaf for examining transcriptome diversity and SNP biomarker discovery

    USDA-ARS?s Scientific Manuscript database

    As an initial step to explore the transcriptome genetic diversity and to discover single nucleotide polymorphic (SNP)-biomarkers for marker assisted breeding within Pima (Gossypium barbadense L.) cotton, leaves from 25 day plants of three diverse genotypes were used to develop cDNA libraries. Using ...

  15. Comprehensive transcriptome profiling reveals long noncoding RNA expression and alternative splicing regulation during fruit development and ripening in kiwifruit (Actinidia chinensis)

    USDA-ARS?s Scientific Manuscript database

    Genomic and transcriptomic data on kiwifruit (Actinidia chinensis) in public databases are very limited despite its nutritional and economic value. Previously, we have constructed and sequenced nine fruit RNA-Seq libraries of A. chinensis cv. 'Hongyang' at immature, mature, and postharvest ripening...

  16. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  17. Integrated sequencing of exome and mRNA of large-sized single cells.

    PubMed

    Wang, Lily Yan; Guo, Jiajie; Cao, Wei; Zhang, Meng; He, Jiankui; Li, Zhoufang

    2018-01-10

    Current approaches of single cell DNA-RNA integrated sequencing are difficult to call SNPs, because a large amount of DNA and RNA is lost during DNA-RNA separation. Here, we performed simultaneous single-cell exome and transcriptome sequencing on individual mouse oocytes. Using microinjection, we kept the nuclei intact to avoid DNA loss, while retaining the cytoplasm inside the cell membrane, to maximize the amount of DNA and RNA captured from the single cell. We then conducted exome-sequencing on the isolated nuclei and mRNA-sequencing on the enucleated cytoplasm. For single oocytes, exome-seq can cover up to 92% of exome region with an average sequencing depth of 10+, while mRNA-sequencing reveals more than 10,000 expressed genes in enucleated cytoplasm, with similar performance for intact oocytes. This approach provides unprecedented opportunities to study DNA-RNA regulation, such as RNA editing at single nucleotide level in oocytes. In future, this method can also be applied to other large cells, including neurons, large dendritic cells and large tumour cells for integrated exome and transcriptome sequencing.

  18. A cost effective 5΄ selective single cell transcriptome profiling approach with improved UMI design

    PubMed Central

    Arguel, Marie-Jeanne; LeBrigand, Kevin; Paquet, Agnès; Ruiz García, Sandra; Zaragosi, Laure-Emmanuelle; Waldmann, Rainer

    2017-01-01

    Abstract Single cell RNA sequencing approaches are instrumental in studies of cell-to-cell variability. 5΄ selective transcriptome profiling approaches allow simultaneous definition of the transcription start size and have advantages over 3΄ selective approaches which just provide internal sequences close to the 3΄ end. The only currently existing 5΄ selective approach requires costly and labor intensive fragmentation and cell barcoding after cDNA amplification. We developed an optimized 5΄ selective workflow where all the cell indexing is done prior to fragmentation. With our protocol, cell indexing can be performed in the Fluidigm C1 microfluidic device, resulting in a significant reduction of cost and labor. We also designed optimized unique molecular identifiers that show less sequence bias and vulnerability towards sequencing errors resulting in an improved accuracy of molecule counting. We provide comprehensive experimental workflows for Illumina and Ion Proton sequencers that allow single cell sequencing in a cost range comparable to qPCR assays. PMID:27940562

  19. Construction of a robust microarray from a non-model species (largemouth bass) using pyrosequencing technology

    PubMed Central

    Garcia-Reyero, Natàlia; Griffitt, Robert J.; Liu, Li; Kroll, Kevin J.; Farmerie, William G.; Barber, David S.; Denslow, Nancy D.

    2009-01-01

    A novel custom microarray for largemouth bass (Micropterus salmoides) was designed with sequences obtained from a normalized cDNA library using the 454 Life Sciences GS-20 pyrosequencer. This approach yielded in excess of 58 million bases of high-quality sequence. The sequence information was combined with 2,616 reads obtained by traditional suppressive subtractive hybridizations to derive a total of 31,391 unique sequences. Annotation and coding sequences were predicted for these transcripts where possible. 16,350 annotated transcripts were selected as target sequences for the design of the custom largemouth bass oligonucleotide microarray. The microarray was validated by examining the transcriptomic response in male largemouth bass exposed to 17β-œstradiol. Transcriptomic responses were assessed in liver and gonad, and indicated gene expression profiles typical of exposure to œstradiol. The results demonstrate the potential to rapidly create the tools necessary to assess large scale transcriptional responses in non-model species, paving the way for expanded impact of toxicogenomics in ecotoxicology. PMID:19936325

  20. Hybrid error correction and de novo assembly of single-molecule sequencing reads

    PubMed Central

    Koren, Sergey; Schatz, Michael C.; Walenz, Brian P.; Martin, Jeffrey; Howard, Jason; Ganapathy, Ganeshkumar; Wang, Zhong; Rasko, David A.; McCombie, W. Richard; Jarvis, Erich D.; Phillippy, Adam M.

    2012-01-01

    Emerging single-molecule sequencing instruments can generate multi-kilobase sequences with the potential to dramatically improve genome and transcriptome assembly. However, the high error rate of single-molecule reads is challenging, and has limited their use to resequencing bacteria. To address this limitation, we introduce a novel correction algorithm and assembly strategy that utilizes shorter, high-identity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on Pacbio RS reads of phage, prokaryotic, and eukaryotic whole genomes, including the novel genome of the parrot Melopsittacus undulatus, as well as for RNA-seq reads of the corn (Zea mays) transcriptome. Our approach achieves over 99.9% read correction accuracy and produces substantially better assemblies than current sequencing strategies: in the best example, quintupling the median contig size relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly. PMID:22750884

  1. Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development

    PubMed Central

    2013-01-01

    Background Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. Results In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. Conclusion RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid. PMID:23617896

  2. Transcriptome analysis of Cymbidium sinense and its application to the identification of genes associated with floral development.

    PubMed

    Zhang, Jianxia; Wu, Kunlin; Zeng, Songjun; Teixeira da Silva, Jaime A; Zhao, Xiaolan; Tian, Chang-En; Xia, Haoqiang; Duan, Jun

    2013-04-24

    Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid.

  3. Identification of novel RNA viruses in alfalfa (Medicago sativa): an Alphapartitivirus, a Deltapartitivirus, and a Marafivirus.

    PubMed

    Kim, Hyein; Park, Dongbin; Hahn, Yoonsoo

    2018-01-05

    Genomic RNA molecules of plant RNA viruses are often co-isolated with the host RNAs, and their sequences can be detected in plant transcriptome datasets. Here, an alfalfa (Medicago sativa) transcriptome dataset was analyzed and three new RNA viruses were identified, which were named Medicago sativa alphapartitivirus 1 (MsAPV1), Medicago sativa deltapartitivirus 1 (MsDPV1), and Medicago sativa marafivirus 1 (MsMV1). The RNA-dependent RNA polymerases of MsAPV1, MsDPV1, and MsMV1 showed about 68%, 58%, and 46% amino acid sequence identity, respectively, with their closest virus species. Sequence similarity and phylogenetic analyses indicated that MsAPV1, MsDPV1, and MsMV1 were novel RNA virus species that belong to the genus Alphapartitivirus of the family Partitiviridae, the genus Deltapartitivirus of the family Partitiviridae, and the genus Marafivirus of the family Tymoviridae, respectively. The bioinformatics procedure applied in this study may facilitate the identification of novel RNA viruses from plant transcriptome data. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Brain transcriptome sequencing and assembly of three songbird model systems for the study of social behavior

    PubMed Central

    Mukai, Motoko; Gonser, Rusty A.; Wingfield, John C.; London, Sarah E.; Tuttle, Elaina M.; Clayton, David F.

    2014-01-01

    Emberizid sparrows (emberizidae) have played a prominent role in the study of avian vocal communication and social behavior. We present here brain transcriptomes for three emberizid model systems, song sparrow Melospiza melodia, white-throated sparrow Zonotrichia albicollis, and Gambel’s white-crowned sparrow Zonotrichia leucophrys gambelii. Each of the assemblies covered fully or in part, over 89% of the previously annotated protein coding genes in the zebra finch Taeniopygia guttata, with 16,846, 15,805, and 16,646 unique BLAST hits in song, white-throated and white-crowned sparrows, respectively. As in previous studies, we find tissue of origin (auditory forebrain versus hypothalamus and whole brain) as an important determinant of overall expression profile. We also demonstrate the successful isolation of RNA and RNA-sequencing from post-mortem samples from building strikes and suggest that such an approach could be useful when traditional sampling opportunities are limited. These transcriptomes will be an important resource for the study of social behavior in birds and for data driven annotation of forthcoming whole genome sequences for these and other bird species. PMID:24883256

  5. The Transcriptomics of Secondary Growth and Wood Formation in Conifers

    PubMed Central

    Carvalho, Ana; Paiva, Jorge; Louzada, José; Lima-Brito, José

    2013-01-01

    In the last years, forestry scientists have adapted genomics and next-generation sequencing (NGS) technologies to the search for candidate genes related to the transcriptomics of secondary growth and wood formation in several tree species. Gymnosperms, in particular, the conifers, are ecologically and economically important, namely, for the production of wood and other forestry end products. Until very recently, no whole genome sequencing of a conifer genome was available. Due to the gradual improvement of the NGS technologies and inherent bioinformatics tools, two draft assemblies of the whole genomes sequence of Picea abies and Picea glauca arose in the current year. These draft genome assemblies will bring new insights about the structure, content, and evolution of the conifer genomes. Furthermore, new directions in the forestry, breeding and research of conifers will be discussed in the following. The identification of genes associated with the xylem transcriptome and the knowledge of their regulatory mechanisms will provide less time-consuming breeding cycles and a high accuracy for the selection of traits related to wood production and quality. PMID:24288610

  6. Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome.

    PubMed

    de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

    2013-07-01

    The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees.

  7. Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome

    PubMed Central

    de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos

    2013-01-01

    The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees. PMID:23885214

  8. Differential expression of genes in the alate and apterous morphs of the brown citrus aphid, Toxoptera citricida

    PubMed Central

    Shang, Feng; Ding, Bi-Yue; Xiong, Ying; Dou, Wei; Wei, Dong; Jiang, Hong-Bo; Wei, Dan-Dan; Wang, Jin-Jun

    2016-01-01

    Winged and wingless morphs in insects represent a trade-off between dispersal ability and reproduction. We studied key genes associated with apterous and alate morphs in Toxoptera citricida (Kirkaldy) using RNAseq, digital gene expression (DGE) profiling, and RNA interference. The de novo assembly of the transcriptome was obtained through Illumina short-read sequencing technology. A total of 44,199 unigenes were generated and 27,640 were annotated. The transcriptomic differences between alate and apterous adults indicated that 279 unigenes were highly expressed in alate adults, whereas 5,470 were expressed at low levels. Expression patterns of the top 10 highly expressed genes in alate adults agreed with wing bud development trends. Silencing of the lipid synthesis and degradation gene (3-ketoacyl-CoA thiolase, mitochondrial-like) and glycogen genes (Phosphoenolpyruvate carboxykinase [GTP]-like and Glycogen phosphorylase-like isoform 2) resulted in underdeveloped wings. This suggests that both lipid and glycogen metabolism provide energy for aphid wing development. The large number of sequences and expression data produced from the transcriptome and DGE sequencing, respectively, increases our understanding of wing development mechanisms. PMID:27577531

  9. Characterization of Chiton Ischnochiton hakodadensis Foot Based on Transcriptome Sequencing

    NASA Astrophysics Data System (ADS)

    Dou, Huaiqian; Miao, Yan; Li, Yuli; Li, Yangping; Dai, Xiaoting; Zhang, Xiaokang; Liang, Pengyu; Liu, Weizhi; Wang, Shi; Bao, Zhenmin

    2018-06-01

    Chiton ( Ischnochiton hakodadensis) is one of marine mollusks well known for its eight separate shell plates. I. hakodadensis is important, which plays a vital role in the ecosystems it inhabits. So far, the genetic studies on the chiton are scarce due in part to insufficient genomic resources available for this species. In this study, we investigated the transcriptome of the chiton foot using Illumina sequencing technology. The reads were assembled and clustered into 256461 unigenes, of which 42247 were divided into diverse functional categories by Gene Ontology (GO) annotation terms, and 17256 mapped onto 365 pathways by KEGG pathway mapping. Meanwhile, a set of differentially expressed genes (DEGs) between distal and proximal muscles were identified as the foot adhesive locomotion associated, thus were useful for our future studies. Moreover, up to 679384 high-quality single nucleotide polymorphisms (SNPs) and 19814 simple sequence repeats (SSRs) were identified in this study, which are valuable for subsequent studies on genetic diversity and variation. The transcriptomic resource obtained in this study should aid to future genetic and genomic studies of chiton.

  10. FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

    PubMed Central

    Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

    2012-01-01

    Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408

  11. FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

    PubMed

    Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

    2012-09-01

    Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.

  12. RNA-seq Transcriptome Analysis of Panax japonicus, and Its Comparison with Other Panax Species to Identify Potential Genes Involved in the Saponins Biosynthesis

    PubMed Central

    Rai, Amit; Yamazaki, Mami; Takahashi, Hiroki; Nakamura, Michimi; Kojoma, Mareshige; Suzuki, Hideyuki; Saito, Kazuki

    2016-01-01

    The Panax genus has been a source of natural medicine, benefitting human health over the ages, among which the Panax japonicus represents an important species. Our understanding of several key pathways and enzymes involved in the biosynthesis of ginsenosides, a pharmacologically active class of metabolites and a major chemical constituents of the rhizome extracts from the Panax species, are limited. Limited genomic information, and lack of studies on comparative transcriptomics across the Panax species have restricted our understanding of the biosynthetic mechanisms of these and many other important classes of phytochemicals. Herein, we describe Illumina based RNA sequencing analysis to characterize the transcriptome and expression profiles of genes expressed in the five tissues of P. japonicus, and its comparison with other Panax species. RNA sequencing and de novo transcriptome assembly for P. japonicus resulted in a total of 135,235 unigenes with 78,794 (58.24%) unigenes being annotated using NCBI-nr database. Transcriptome profiling, and gene ontology enrichment analysis for five tissues of P. japonicus showed that although overall processes were evenly conserved across all tissues. However, each tissue was characterized by several unique unigenes with the leaves showing the most unique unigenes among the tissues studied. A comparative analysis of the P. japonicus transcriptome assembly with publically available transcripts from other Panax species, namely, P. ginseng, P. notoginseng, and P. quinquefolius also displayed high sequence similarity across all Panax species, with P. japonicus showing highest similarity with P. ginseng. Annotation of P. japonicus transcriptome resulted in the identification of putative genes encoding all enzymes from the triterpene backbone biosynthetic pathways, and identified 24 and 48 unigenes annotated as cytochrome P450 (CYP) and glycosyltransferases (GT), respectively. These CYPs and GTs annotated unigenes were conserved across all Panax species and co-expressed with other the transcripts involved in the triterpenoid backbone biosynthesis pathways. Unigenes identified in this study represent strong candidates for being involved in the triterpenoid saponins biosynthesis, and can serve as a basis for future validation studies. PMID:27148308

  13. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation.

    PubMed

    Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K; Andreasen, Peter A; Bendixen, Christian; Panitz, Frank

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes.

  14. Multi-tissue RNA-seq and transcriptome characterisation of the spiny dogfish shark (Squalus acanthias) provides a molecular tool for biological research and reveals new genes involved in osmoregulation

    PubMed Central

    Chana-Munoz, Andres; Jendroszek, Agnieszka; Sønnichsen, Malene; Kristiansen, Rune; Jensen, Jan K.; Bendixen, Christian

    2017-01-01

    The spiny dogfish shark (Squalus acanthias) is one of the most commonly used cartilaginous fishes in biological research, especially in the fields of nitrogen metabolism, ion transporters and osmoregulation. Nonetheless, transcriptomic data for this organism is scarce. In the present study, a multi-tissue RNA-seq experiment and de novo transcriptome assembly was performed in four different spiny dogfish tissues (brain, liver, kidney and ovary), providing an annotated sequence resource. The characterization of the transcriptome greatly increases the scarce sequence information for shark species. Reads were assembled with the Trinity de novo assembler both within each tissue and across all tissues combined resulting in 362,690 transcripts in the combined assembly which represent 289,515 Trinity genes. BUSCO analysis determined a level of 87% completeness for the combined transcriptome. In total, 123,110 proteins were predicted of which 78,679 and 83,164 had significant hits against the SwissProt and Uniref90 protein databases, respectively. Additionally, 61,215 proteins aligned to known protein domains, 7,208 carried a signal peptide and 15,971 possessed at least one transmembrane region. Based on the annotation, 81,582 transcripts were assigned to gene ontology terms and 42,078 belong to known clusters of orthologous groups (eggNOG). To demonstrate the value of our molecular resource, we show that the improved transcriptome data enhances the current possibilities of osmoregulation research in spiny dogfish by utilizing the novel gene and protein annotations to investigate a set of genes involved in urea synthesis and urea, ammonia and water transport, all of them crucial in osmoregulation. We describe the presence of different gene copies and isoforms of key enzymes involved in this process, including arginases and transporters of urea and ammonia, for which sequence information is currently absent in the databases for this model species. The transcriptome assemblies and the derived annotations generated in this study will support the ongoing research for this particular animal model and provides a new molecular tool to assist biological research in cartilaginous fishes. PMID:28832628

  15. Transcriptome de novo assembly sequencing and analysis of the toxic dinoflagellate Alexandrium catenella using the Illumina platform.

    PubMed

    Zhang, Shu; Sui, Zhenghong; Chang, Lianpeng; Kang, Kyoungho; Ma, Jinhua; Kong, Fanna; Zhou, Wei; Wang, Jinguo; Guo, Liliang; Geng, Huili; Zhong, Jie; Ma, Qingxia

    2014-03-10

    In this article, high-throughput de novo transcriptomic sequencing was performed in Alexandrium catenella, which provided the first view of the gene repertoire in this dinoflagellate based on next-generation sequencing (NGS) technologies. A total of 118,304 unigenes were identified with an average length of 673bp (base pair). Of these unigenes, 77,936 (65.9%) were annotated with known proteins based on sequence similarities, among which 24,149 and 22,956 unigenes were assigned to gene ontology categories (GO) and clusters of orthologous groups (COGs), respectively. Furthermore, 16,467 unigenes were mapped onto 322 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG). We also detected 1143 simple sequence repeats (SSRs), in which the tri-nucleotide repeat motif (69.3%) was the most abundant. The genetic facts and significance derived from the transcriptome dataset were suggested and discussed. All four core nucleosomal histones and linker histones were detected, in addition to the unigenes involved in histone modifications.190 unigenes were identified as being involved in the endocytosis pathway, and clathrin-dependent endocytosis was suggested to play a role in the heterotrophy of A. catenella. A conserved 22-nt spliced leader (SL) was identified in 21 unigenes which suggested the existence of trans-splicing processing of mRNA in A. catenella. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.

  16. Application of the whole-transcriptome shotgun sequencing approach to the study of Philadelphia-positive acute lymphoblastic leukemia

    PubMed Central

    Iacobucci, I; Ferrarini, A; Sazzini, M; Giacomelli, E; Lonetti, A; Xumerle, L; Ferrari, A; Papayannidis, C; Malerba, G; Luiselli, D; Boattini, A; Garagnani, P; Vitale, A; Soverini, S; Pane, F; Baccarani, M; Delledonne, M; Martinelli, G

    2012-01-01

    Although the pathogenesis of BCR–ABL1-positive acute lymphoblastic leukemia (ALL) is mainly related to the expression of the BCR–ABL1 fusion transcript, additional cooperating genetic lesions are supposed to be involved in its development and progression. Therefore, in an attempt to investigate the complex landscape of mutations, changes in expression profiles and alternative splicing (AS) events that can be observed in such disease, the leukemia transcriptome of a BCR–ABL1-positive ALL patient at diagnosis and at relapse was sequenced using a whole-transcriptome shotgun sequencing (RNA-Seq) approach. A total of 13.9 and 15.8 million sequence reads was generated from de novo and relapsed samples, respectively, and aligned to the human genome reference sequence. This led to the identification of five validated missense mutations in genes involved in metabolic processes (DPEP1, TMEM46), transport (MVP), cell cycle regulation (ABL1) and catalytic activity (CTSZ), two of which resulted in acquired relapse variants. In all, 6390 and 4671 putative AS events were also detected, as well as expression levels for 18 315 and 18 795 genes, 28% of which were differentially expressed in the two disease phases. These data demonstrate that RNA-Seq is a suitable approach for identifying a wide spectrum of genetic alterations potentially involved in ALL. PMID:22829256

  17. Comparative Transcriptomics to Identify Novel Genes and Pathways in Dinoflagellates

    NASA Astrophysics Data System (ADS)

    Ryan, D.

    2016-02-01

    The unarmored dinoflagellate Karenia brevis is among the most prominent harmful, bloom-forming phytoplankton species in the Gulf of Mexico. During blooms, the polyketides PbTx-1 and PbTx-2 (brevetoxins) are produced by K. brevis. Brevetoxins negatively impact human health and the Gulf shellfish harvest. However, the genes underlying brevetoxin synthesis are currently unknown. Because the K. brevis genome is extremely large ( 1 × 1011 base pairs long), and with a high proportion of repetitive, non-coding DNA, it has not been sequenced. In fact, large, repetitive genomes are common among the dinoflagellate group. High-throughput RNA sequencing technology enabled us to assemble Karenia transcriptomes de novo and investigate potential genes in the brevetoxin pathway through comparative transcriptomics. The brevetoxin profile varies among K. brevis clonal cultures. For example, well-documented Wilson-CCFWC268 typically produces 8-10 pg PbTx per cell, whereas SP1 produces < 2 pg PbTx/cell, and the mutant low-toxin Wilson clone produces undetectable to low (<0.05 pg/cell) amounts. Further, PbTx-2 has been measured in Karenia papilionacea but not Karenia mikimotoi. We compared the transcriptomes of four K. brevis clones (Wilson-CCFWC268, SP3, SP1, and mutant low-toxin Wilson) with K. papilionacea and K. mikimotoi to investigate nucleotide-level genetic variations and differences in gene expression. Of the 85,000 transcripts in the K. brevis transcriptome, 4,600 transcripts, including novel unannotated orthologs and putative polyketide synthases (PKSs), were only expressed by brevetoxin-producing K. brevis and K. papilionacea, not K. mikimotoi. Examination of gene expression between the typical- and low-toxin Wilson clones identified about 3,500 genes with significantly different expression levels, including 2 putative PKSs. One of the 2 PKSs was only found in the brevetoxin-producing Karenia species. These transcriptomes could not have been characterized without high-throughput RNA sequencing.

  18. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis.

    PubMed

    Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich

    2015-12-16

    Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.

  19. De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits.

    PubMed

    Zhu, Haisheng; Liu, Jianting; Wen, Qingfang; Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng

    2017-01-01

    Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.

  20. Omics approaches in food safety: fulfilling the promise?

    PubMed Central

    Bergholz, Teresa M.; Moreno Switt, Andrea I.; Wiedmann, Martin

    2014-01-01

    Genomics, transcriptomics, and proteomics are rapidly transforming our approaches to detection, prevention and treatment of foodborne pathogens. Microbial genome sequencing in particular has evolved from a research tool into an approach that can be used to characterize foodborne pathogen isolates as part of routine surveillance systems. Genome sequencing efforts will not only improve outbreak detection and source tracking, but will also create large amounts of foodborne pathogen genome sequence data, which will be available for data mining efforts that could facilitate better source attribution and provide new insights into foodborne pathogen biology and transmission. While practical uses and application of metagenomics, transcriptomics, and proteomics data and associated tools are less prominent, these tools are also starting to yield practical food safety solutions. PMID:24572764

  1. Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

    PubMed

    Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

    2017-07-12

    Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.

  2. Peroxidase gene discovery from the horseradish transcriptome.

    PubMed

    Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G

    2014-03-24

    Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.

  3. Peroxidase gene discovery from the horseradish transcriptome

    PubMed Central

    2014-01-01

    Background Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. Results In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. Conclusions This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes. PMID:24666710

  4. Transcriptome Profiling of a Multiple Recurrent Muscle-Invasive Urothelial Carcinoma of the Bladder by Deep Sequencing

    PubMed Central

    Zhang, Shufang; Liu, Yanxuan; Liu, Zhenxiang; Zhang, Chong; Cao, Hui; Ye, Yongqing; Wang, Shunlan; Zhang, Ying'ai; Xiao, Sifang; Yang, Peng; Li, Jindong; Bai, Zhiming

    2014-01-01

    Urothelial carcinoma of the bladder (UCB) is one of the commonly diagnosed cancers in the world. The UCB has the highest rate of recurrence of any malignancy. A genome-wide screening of transcriptome dysregulation between cancer and normal tissue would provide insight into the molecular basis of UCB recurrence and is a key step to discovering biomarkers for diagnosis and therapeutic targets. Compared with microarray technology, which is commonly used to identify expression level changes, the recently developed RNA-seq technique has the ability to detect other abnormal regulations in the cancer transcriptome, such as alternative splicing. In this study, we performed high-throughput transcriptome sequencing at ∼50× coverage on a recurrent muscle-invasive cisplatin-resistance UCB tissue and the adjacent non-tumor tissue. The results revealed cancer-specific differentially expressed genes between the tumor and non-tumor tissue enriched in the cell adhesion molecules, focal adhesion and ECM-receptor interaction pathway. Five dysregulated genes, including CDH1, VEGFA, PTPRF, CLDN7, and MMP2 were confirmed by Real time qPCR in the sequencing samples and the additional eleven samples. Our data revealed that more than three hundred genes showed differential splicing patterns between tumor tissue and non-tumor tissue. Among these genes, we filtered 24 cancer-associated alternative splicing genes with differential exon usage. The findings from RNA-Seq were validated by Real time qPCR for CD44, PDGFA, NUMB, and LPHN2. This study provides a comprehensive survey of the UCB transcriptome, which provides better insight into the complexity of regulatory changes during recurrence and metastasis. PMID:24622401

  5. Improved annotation with de novo transcriptome assembly in four social amoeba species.

    PubMed

    Singh, Reema; Lawal, Hajara M; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J; Schaap, Pauline; Cole, Christian

    2017-01-31

    Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

  6. Development of Transcriptomic Resources for Interrogating the Biosynthesis of Monoterpene Indole Alkaloids in Medicinal Plant Species

    PubMed Central

    Góngora-Castillo, Elsa; Childs, Kevin L.; Fedewa, Greg; Hamilton, John P.; Liscombe, David K.; Magallanes-Lundback, Maria; Mandadi, Kranthi K.; Nims, Ezekiel; Runguphan, Weerawat; Vaillancourt, Brieanne; Varbanova-Herde, Marina; DellaPenna, Dean; McKnight, Thomas D.; O’Connor, Sarah; Buell, C. Robin

    2012-01-01

    The natural diversity of plant metabolism has long been a source for human medicines. One group of plant-derived compounds, the monoterpene indole alkaloids (MIAs), includes well-documented therapeutic agents used in the treatment of cancer (vinblastine, vincristine, camptothecin), hypertension (reserpine, ajmalicine), malaria (quinine), and as analgesics (7-hydroxymitragynine). Our understanding of the biochemical pathways that synthesize these commercially relevant compounds is incomplete due in part to a lack of molecular, genetic, and genomic resources for the identification of the genes involved in these specialized metabolic pathways. To address these limitations, we generated large-scale transcriptome sequence and expression profiles for three species of Asterids that produce medicinally important MIAs: Camptotheca acuminata, Catharanthus roseus, and Rauvolfia serpentina. Using next generation sequencing technology, we sampled the transcriptomes of these species across a diverse set of developmental tissues, and in the case of C. roseus, in cultured cells and roots following elicitor treatment. Through an iterative assembly process, we generated robust transcriptome assemblies for all three species with a substantial number of the assembled transcripts being full or near-full length. The majority of transcripts had a related sequence in either UniRef100, the Arabidopsis thaliana predicted proteome, or the Pfam protein domain database; however, we also identified transcripts that lacked similarity with entries in either database and thereby lack a known function. Representation of known genes within the MIA biosynthetic pathway was robust. As a diverse set of tissues and treatments were surveyed, expression abundances of transcripts in the three species could be estimated to reveal transcripts associated with development and response to elicitor treatment. Together, these transcriptomes and expression abundance matrices provide a rich resource for understanding plant specialized metabolism, and promotes realization of innovative production systems for plant-derived pharmaceuticals. PMID:23300689

  7. Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa (Lepidoptera: Notodontidae).

    PubMed

    Gschloessl, B; Dorkeld, F; Berges, H; Beydon, G; Bouchez, O; Branco, M; Bretaudeau, A; Burban, C; Dubois, E; Gauthier, P; Lhuillier, E; Nichols, J; Nidelet, S; Rocha, S; Sauné, L; Streiff, R; Gautier, M; Kerdelhué, C

    2018-05-01

    The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (http://bipaa.genouest.org/sp/thaumetopoea_pityocampa/). © 2018 John Wiley & Sons Ltd.

  8. Transcriptome and Proteome Exploration to Provide a Resource for the Study of Agrocybe aegerita

    PubMed Central

    Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Background Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. Methodology/Principal Findings To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. Conclusions/Significance This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry. PMID:23418592

  9. Analysis of the Salivary Gland Transcriptome of Frankliniella occidentalis

    PubMed Central

    Stafford-Banks, Candice A.; Rotenberg, Dorith; Johnson, Brian R.; Whitfield, Anna E.; Ullman, Diane E.

    2014-01-01

    Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande) (the western flower thrips) is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina) technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E−6) to known proteins, whereas a high percentage (61.24%) of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome) against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways) of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the viruses they transmit. PMID:24736614

  10. Analysis of the salivary gland transcriptome of Frankliniella occidentalis.

    PubMed

    Stafford-Banks, Candice A; Rotenberg, Dorith; Johnson, Brian R; Whitfield, Anna E; Ullman, Diane E

    2014-01-01

    Saliva is known to play a crucial role in insect feeding behavior and virus transmission. Currently, little is known about the salivary glands and saliva of thrips, despite the fact that Frankliniella occidentalis (Pergande) (the western flower thrips) is a serious pest due to its destructive feeding, wide host range, and transmission of tospoviruses. As a first step towards characterizing thrips salivary gland functions, we sequenced the transcriptome of the primary salivary glands of F. occidentalis using short read sequencing (Illumina) technology. A de novo-assembled transcriptome revealed 31,392 high quality contigs with an average size of 605 bp. A total of 12,166 contigs had significant BLASTx or tBLASTx hits (E≤1.0E-6) to known proteins, whereas a high percentage (61.24%) of contigs had no apparent protein or nucleotide hits. Comparison of the F. occidentalis salivary gland transcriptome (sialotranscriptome) against a published F. occidentalis full body transcriptome assembled from Roche-454 reads revealed several contigs with putative annotations associated with salivary gland functions. KEGG pathway analysis of the sialotranscriptome revealed that the majority (18 out of the top 20 predicted KEGG pathways) of the salivary gland contig sequences match proteins involved in metabolism. We identified several genes likely to be involved in detoxification and inhibition of plant defense responses including aldehyde dehydrogenase, metalloprotease, glucose oxidase, glucose dehydrogenase, and regucalcin. We also identified several genes that may play a role in the extra-oral digestion of plant structural tissues including β-glucosidase and pectin lyase; and the extra-oral digestion of sugars, including α-amylase, maltase, sucrase, and α-glucosidase. This is the first analysis of a sialotranscriptome for any Thysanopteran species and it provides a foundational tool to further our understanding of how thrips interact with their plant hosts and the viruses they transmit.

  11. Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

    PubMed

    Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

    2018-06-03

    Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.

  12. Transcriptome Assembly and Analysis of Tibetan Hulless Barley (Hordeum vulgare L. var. nudum) Developing Grains, with Emphasis on Quality Properties

    PubMed Central

    Chen, Xin; Long, Hai; Gao, Ping; Deng, Guangbing; Pan, Zhifen; Liang, Junjun; Tang, Yawei; Tashi, Nyima; Yu, Maoqun

    2014-01-01

    Background Hulless barley is attracting increasing attention due to its unique nutritional value and potential health benefits. However, the molecular biology of the barley grain development and nutrient storage are not well understood. Furthermore, the genetic potential of hulless barley has not been fully tapped for breeding. Methodology/Principal Findings In the present study, we investigated the transcriptome features during hulless barley grain development. Using Illumina paired-end RNA-Sequencing, we generated two data sets of the developing grain transcriptomes from two hulless barley landraces. A total of 13.1 and 12.9 million paired-end reads with lengths of 90 bp were generated from the two varieties and were assembled to 48,863 and 45,788 unigenes, respectively. A combined dataset of 46,485 All-Unigenes were generated from two transcriptomes with an average length of 542 bp, and 36,278 among were annotated with gene descriptions, conserved protein domains or gene ontology terms. Furthermore, sequences and expression levels of genes related to the biosynthesis of storage reserve compounds (starch, protein, and β-glucan) were analyzed, and their temporal and spatial patterns were deduced from the transcriptome data of cultivated barley Morex. Conclusions/Significance We established a sequences and functional annotation integrated database and examined the expression profiles of the developing grains of Tibetan hulless barley. The characterization of genes encoding storage proteins and enzymes of starch synthesis and (1–3;1–4)-β-D-glucan synthesis provided an overview of changes in gene expression associated with grain nutrition and health properties. Furthermore, the characterization of these genes provides a gene reservoir, which helps in quality improvement of hulless barley. PMID:24871534

  13. Transcriptome and proteome exploration to provide a resource for the study of Agrocybe aegerita.

    PubMed

    Wang, Man; Gu, Bianli; Huang, Jie; Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry.

  14. Decoding genes with coexpression networks and metabolomics - 'majority report by precogs'.

    PubMed

    Saito, Kazuki; Hirai, Masami Y; Yonekura-Sakakibara, Keiko

    2008-01-01

    Following the sequencing of whole genomes of model plants, high-throughput decoding of gene function is a major challenge in modern plant biology. In view of remarkable technical advances in transcriptomics and metabolomics, integrated analysis of these 'omics' by data-mining informatics is an excellent tool for prediction and identification of gene function, particularly for genes involved in complicated metabolic pathways. The availability of Arabidopsis public transcriptome datasets containing data of >1000 microarrays reinforces the potential for prediction of gene function by transcriptome coexpression analysis. Here, we review the strategy of combining transcriptome and metabolome as a powerful technology for studying the functional genomics of model plants and also crop and medicinal plants.

  15. De Novo Assembly and Characterization of the Transcriptome of the Chinese Medicinal Herb, Gentiana rigescens

    PubMed Central

    Zhang, Xiaodong; Allan, Andrew C.; Li, Caixia; Wang, Yuanzhong; Yao, Qiuyang

    2015-01-01

    Gentiana rigescens is an important medicinal herb in China. The main validated medicinal component gentiopicroside is synthesized in shoots, but is mainly found in the plant’s roots. The gentiopicroside biosynthetic pathway and its regulatory control remain to be elucidated. Genome resources of gentian are limited. Next-generation sequencing (NGS) technologies can aid in supplying global gene expression profiles. In this study we present sequence and transcript abundance data for the root and leaf transcriptome of G. rigescens, obtained using the Illumina Hiseq2000. Over fifty million clean reads were obtained from leaf and root libraries. This yields 76,717 unigenes with an average length of 753 bp. Among these, 33,855 unigenes were identified as putative homologs of annotated sequences in public protein and nucleotide databases. Digital abundance analysis identified 3306 unigenes differentially enriched between leaf and root. Unigenes found in both tissues were categorized according to their putative functional categories. Of the differentially expressed genes, over 130 were annotated as related to terpenoid biosynthesis. This work is the first study of global transcriptome analyses in gentian. These sequences and putative functional data comprise a resource for future investigation of terpenoid biosynthesis in Gentianaceae species and annotation of the gentiopicroside biosynthetic pathway and its regulatory mechanisms. PMID:26006235

  16. Evaluation of the impact of RNA preservation methods of spiders for de novo transcriptome assembly.

    PubMed

    Kono, Nobuaki; Nakamura, Hiroyuki; Ito, Yusuke; Tomita, Masaru; Arakawa, Kazuharu

    2016-05-01

    With advances in high-throughput sequencing technologies, de novo transcriptome sequencing and assembly has become a cost-effective method to obtain comprehensive genetic information of a species of interest, especially in nonmodel species with large genomes such as spiders. However, high-quality RNA is essential for successful sequencing, and sample preservation conditions require careful consideration for the effective storage of field-collected samples. To this end, we report a streamlined feasibility study of various storage conditions and their effects on de novo transcriptome assembly results. The storage parameters considered include temperatures ranging from room temperature to -80°C; preservatives, including ethanol, RNAlater, TRIzol and RNAlater-ICE; and sample submersion states. As a result, intact RNA was extracted and assembly was successful when samples were preserved at low temperatures regardless of the type of preservative used. The assemblies as well as the gene expression profiles were shown to be robust to RNA degradation, when 30 million 150-bp paired-end reads are obtained. The parameters for sample storage, RNA extraction, library preparation, sequencing and in silico assembly considered in this work provide a guideline for the study of field-collected samples of spiders. © 2015 John Wiley & Sons Ltd.

  17. SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes.

    PubMed

    D'Agostino, Nunzio; Traini, Alessandra; Frusciante, Luigi; Chiusano, Maria Luisa

    2009-11-30

    Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of Solanaceae transcriptomes, an attractive route for understanding Solanaceae genome functionality and a powerful reference for the structural annotation of emerging Solanaceae genome sequences. We describe the SolEST database http://biosrv.cab.unina.it/solestdb which integrates different EST datasets from both cultivated and wild Solanaceae species and from two species of the genus Coffea. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing Solanaceae EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different Solanaceae assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community. Different databases have been established worldwide for collecting Solanaceae ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of Solanaceae transcriptomes.

  18. De novo assembly of the transcriptome of Aegiceras corniculatum, a mangrove species in the Indo-West Pacific region.

    PubMed

    Fang, Lu; Yang, Yuchen; Guo, Wuxia; Li, Jianfang; Zhong, Cairong; Huang, Yelin; Zhou, Renchao; Shi, Suhua

    2016-08-01

    Aegiceras corniculatum (L.) Blanco is one of the most salt tolerant mangrove species and can thrive in 3% salinity at the seaward edge of mangrove forests. Here we sequenced the transcriptome of A. corniculatum used Illumina GA platform to develop its genomic resources for ecological and evolutionary studies. We obtained about 50 million high-quality paired-end reads with 75bp in length. Using the short read assembler Velvet, we yielded 49,437 contigs with the average length of 625bp. A total of 32,744 (66.23%) contigs showed significant similarity to the GenBank non-redundant (NR) protein database. 30,911 and 18,004 of these sequences were assigned to Gene Ontology and eukaryotic orthologous groups of proteins (KOG). A total of 4942 transcripts from our assemblies had significant similarity with KEGG Orthologs and were involved in 144 KEGG pathways, while 9899 unigenes had enzyme commission (EC) numbers. In addition, 9792 transcriptome-derived SSRs were identified from 7342 sequences. With our strict criteria, 4165 candidate SNPs were also identified from 2058 contigs. Some of these SNPs were further validated by Sanger sequencing. Genomic resources generated in this study should be valuable in ecological, evolutionary, and functional genomics studies for this mangrove species. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Discovery of parvovirus-related sequences in an unexpected broad range of animals.

    PubMed

    François, S; Filloux, D; Roumagnac, P; Bigot, D; Gayral, P; Martin, D P; Froissart, R; Ogliastro, M

    2016-09-07

    Our knowledge of the genetic diversity and host ranges of viruses is fragmentary. This is particularly true for the Parvoviridae family. Genetic diversity studies of single stranded DNA viruses within this family have been largely focused on arthropod- and vertebrate-infecting species that cause diseases of humans and our domesticated animals: a focus that has biased our perception of parvovirus diversity. While metagenomics approaches could help rectify this bias, so too could transcriptomics studies. Large amounts of transcriptomic data are available for a diverse array of animal species and whenever this data has inadvertently been gathered from virus-infected individuals, it could contain detectable viral transcripts. We therefore performed a systematic search for parvovirus-related sequences (PRSs) within publicly available transcript, genome and protein databases and eleven new transcriptome datasets. This revealed 463 PRSs in the transcript databases of 118 animals. At least 41 of these PRSs are likely integrated within animal genomes in that they were also found within genomic sequence databases. Besides illuminating the ubiquity of parvoviruses, the number of parvoviral sequences discovered within public databases revealed numerous previously unknown parvovirus-host combinations; particularly in invertebrates. Our findings suggest that the host-ranges of extant parvoviruses might span the entire animal kingdom.

  20. De Novo Transcriptomes of a Mixotrophic and a Heterotrophic Ciliate from Marine Plankton

    PubMed Central

    Santoferrara, Luciana F.; Guida, Stephanie; Zhang, Huan; McManus, George B.

    2014-01-01

    Studying non-model organisms is crucial in the context of the current development of genomics and transcriptomics for both physiological experimentation and environmental characterization. We investigated the transcriptomes of two marine planktonic ciliates, the mixotrophic oligotrich Strombidium rassoulzadegani and the heterotrophic choreotrich Strombidinopsis sp., and their respective algal food using Illumina RNAseq. Our aim was to characterize the transcriptomes of these contrasting ciliates and to identify genes potentially involved in mixotrophy. We detected approximately 10,000 and 7,600 amino acid sequences for S. rassoulzadegani and Strombidinopsis sp., respectively. About half of these transcripts had significant BLASTP hits (E-value <10−6) against previously-characterized sequences, mostly from the model ciliate Oxytricha trifallax. Transcriptomes from both the mixotroph and the heterotroph species provided similar annotations for GO terms and KEGG pathways. Most of the identified genes were related to housekeeping activity and pathways such as the metabolism of carbohydrates, lipids, amino acids, nucleotides, and vitamins. Although S. rassoulzadegani can keep and use chloroplasts from its prey, we did not find genes clearly linked to chloroplast maintenance and functioning in the transcriptome of this ciliate. While chloroplasts are known sources of reactive oxygen species (ROS), we found the same complement of antioxidant pathways in both ciliates, except for one enzyme possibly linked to ascorbic acid recycling found exclusively in the mixotroph. Contrary to our expectations, we did not find qualitative differences in genes potentially related to mixotrophy. However, these transcriptomes will help to establish a basis for the evaluation of differential gene expression in oligotrichs and choreotrichs and experimental investigation of the costs and benefits of mixotrophy. PMID:24983246

  1. Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

    PubMed Central

    2012-01-01

    Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies. PMID:23167289

  2. An Atlas of annotations of Hydra vulgaris transcriptome.

    PubMed

    Evangelista, Daniela; Tripathi, Kumar Parijat; Guarracino, Mario Rosario

    2016-09-22

    RNA sequencing takes advantage of the Next Generation Sequencing (NGS) technologies for analyzing RNA transcript counts with an excellent accuracy. Trying to interpret this huge amount of data in biological information is still a key issue, reason for which the creation of web-resources useful for their analysis is highly desiderable. Starting from a previous work, Transcriptator, we present the Atlas of Hydra's vulgaris, an extensible web tool in which its complete transcriptome is annotated. In order to provide to the users an advantageous resource that include the whole functional annotated transcriptome of Hydra vulgaris water polyp, we implemented the Atlas web-tool contains 31.988 accesible and downloadable transcripts of this non-reference model organism. Atlas, as a freely available resource, can be considered a valuable tool to rapidly retrieve functional annotation for transcripts differentially expressed in Hydra vulgaris exposed to the distinct experimental treatments. WEB RESOURCE URL: http://www-labgtp.na.icar.cnr.it/Atlas .

  3. Transcriptome analysis and related databases of Lactococcus lactis.

    PubMed

    Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan

    2002-08-01

    Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.

  4. RNA-Seq Transcriptome Profiling of Upland Cotton (Gossypium hirsutum L.) Root Tissue under Water-Deficit Stress

    PubMed Central

    Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd

    2013-01-01

    An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815

  5. Transcriptome profile of Trichoderma harzianum IOC-3844 induced by sugarcane bagasse.

    PubMed

    Horta, Maria Augusta Crivelente; Vicentini, Renato; Delabona, Priscila da Silva; Laborda, Prianda; Crucello, Aline; Freitas, Sindélia; Kuroshu, Reginaldo Massanobu; Polikarpov, Igor; Pradella, José Geraldo da Cruz; Souza, Anete Pereira

    2014-01-01

    Profiling the transcriptome that underlies biomass degradation by the fungus Trichoderma harzianum allows the identification of gene sequences with potential application in enzymatic hydrolysis processing. In the present study, the transcriptome of T. harzianum IOC-3844 was analyzed using RNA-seq technology. The sequencing generated 14.7 Gbp for downstream analyses. De novo assembly resulted in 32,396 contigs, which were submitted for identification and classified according to their identities. This analysis allowed us to define a principal set of T. harzianum genes that are involved in the degradation of cellulose and hemicellulose and the accessory genes that are involved in the depolymerization of biomass. An additional analysis of expression levels identified a set of carbohydrate-active enzymes that are upregulated under different conditions. The present study provides valuable information for future studies on biomass degradation and contributes to a better understanding of the role of the genes that are involved in this process.

  6. Solar Radiation Stress in Natural Acidophilic Biofilms of Euglena mutabilis Revealed by Metatranscriptomics and PAM Fluorometry.

    PubMed

    Puente-Sánchez, Fernando; Olsson, Sanna; Gómez-Rodriguez, Manuel; Souza-Egipsy, Virginia; Altamirano-Jeschke, Maria; Amils, Ricardo; Parro, Victor; Aguilera, Angeles

    2016-02-01

    The daily photosynthetic performance of a natural biofilm of the extreme acidophilic Euglena mutabilis from Río Tinto (SW, Spain) under full solar radiation was analyzed by means of pulse amplitude-modulated (PAM) fluorescence measurements and metatrascriptomic analysis. Natural E. mutabilis biofilms undergo large-scale transcriptomic reprogramming during midday due to a dynamic photoinhibition and solar radiation stress. Photoinhibition is due to UV radiation and not to light intensity, as revealed by PAM fluorometry analysis. In order to minimize the negative effects of solar radiation, our data supports the presence of a circadian rhythm in this euglenophyte that increases their opportunity to survive. Differential gene expression throughout the day (at 12:00, 20:00 and night) was monitored by massive Illumina parallel sequencing of metatranscriptomic libraries. The transcription pattern was altered in genes involved in Photosystem II stability and repair, UV damaged DNA repair, non-photochemical quenching and oxidative stress, supporting the photoinhibition detected by PAM fluorometry at midday. Copyright © 2016 Elsevier GmbH. All rights reserved.

  7. Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq.

    PubMed

    Jaitin, Diego Adhemar; Weiner, Assaf; Yofe, Ido; Lara-Astiaso, David; Keren-Shaul, Hadas; David, Eyal; Salame, Tomer Meir; Tanay, Amos; van Oudenaarden, Alexander; Amit, Ido

    2016-12-15

    In multicellular organisms, dedicated regulatory circuits control cell type diversity and responses. The crosstalk and redundancies within these circuits and substantial cellular heterogeneity pose a major research challenge. Here, we present CRISP-seq, an integrated method for massively parallel single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-pooled screens. We show that profiling the genomic perturbation and transcriptome in the same cell enables us to simultaneously elucidate the function of multiple factors and their interactions. We applied CRISP-seq to probe regulatory circuits of innate immunity. By sampling tens of thousands of perturbed cells in vitro and in mice, we identified interactions and redundancies between developmental and signaling-dependent factors. These include opposing effects of Cebpb and Irf8 in regulating the monocyte/macrophage versus dendritic cell lineages and differential functions for Rela and Stat1/2 in monocyte versus dendritic cell responses to pathogens. This study establishes CRISP-seq as a broadly applicable, comprehensive, and unbiased approach for elucidating mammalian regulatory circuits. Copyright © 2016 Elsevier Inc. All rights reserved.

  8. Gene Identification of Pheromone Gland Genes Involved in Type II Sex Pheromone Biosynthesis and Transportation in Female Tea Pest Ectropis grisescens

    PubMed Central

    Li, Zhao-Qun; Ma, Long; Yin, Qian; Cai, Xiao-Ming; Luo, Zong-Xiu; Bian, Lei; Xin, Zhao-Jun; He, Peng; Chen, Zong-Mao

    2018-01-01

    Moths can biosynthesize sex pheromones in the female sex pheromone glands (PGs) and can distinguish species-specific sex pheromones using their antennae. However, the biosynthesis and transportation mechanism for Type II sex pheromone components has rarely been documented in moths. In this study, we constructed a massive PG transcriptome database (14.72 Gb) from a moth species, Ectropis grisescens, which uses type II sex pheromones and is a major tea pest in China. We further identified putative sex pheromone biosynthesis and transportation-related unigenes: 111 cytochrome P450 monooxygenases (CYPs), 25 odorant-binding proteins (OBPs), and 20 chemosensory proteins (CSPs). Tissue expression and phylogenetic tree analyses showed that one CYP (EgriCYP341-fragment3), one OBP (EgriOBP4), and one CSP (EgriCSP10) gene displayed an enriched expression in the PGs, and that EgriOBP2, 3, and 25 are clustered in the moth pheromone-binding protein clade. We considered these our candidate genes. Our results yielded large-scale PG sequence information for further functional studies. PMID:29317471

  9. Illumina sequencing of green stink bug nymph and adult cdna to identify potential rnai gene targets

    USDA-ARS?s Scientific Manuscript database

    Whole-body transcriptomes for nymphs and adults of the green stink bug, Acrosternum hilare (Say), were sequenced on an Illumina® Genome Analyzer IIx sequencer. The insects were collected from sites in North Carolina and Virginia, USA. The cDNA library for each sample was sequenced on one lane of an...

  10. Transcriptomics in cancer diagnostics: developments in technology, clinical research and commercialization.

    PubMed

    Sager, Monica; Yeat, Nai Chien; Pajaro-Van der Stadt, Stefan; Lin, Charlotte; Ren, Qiuyin; Lin, Jimmy

    2015-01-01

    Transcriptomic technologies are evolving to diagnose cancer earlier and more accurately to provide greater predictive and prognostic utility to oncologists and patients. Digital techniques such as RNA sequencing are replacing still-imaging techniques to provide more detailed analysis of the transcriptome and aberrant expression that causes oncogenesis, while companion diagnostics are developing to determine the likely effectiveness of targeted treatments. This article examines recent advancements in molecular profiling research and technology as applied to cancer diagnosis, clinical applications and predictions for the future of personalized medicine in oncology.

  11. Massively parallel pyrosequencing-based transcriptome analyses of small brown planthopper (Laodelphax striatellus), a vector insect transmitting rice stripe virus (RSV)

    PubMed Central

    2010-01-01

    Background The small brown planthopper (Laodelphax striatellus) is an important agricultural pest that not only damages rice plants by sap-sucking, but also acts as a vector that transmits rice stripe virus (RSV), which can cause even more serious yield loss. Despite being a model organism for studying entomology, population biology, plant protection, molecular interactions among plants, viruses and insects, only a few genomic sequences are available for this species. To investigate its transcriptome and determine the differences between viruliferous and naïve L. striatellus, we employed 454-FLX high-throughput pyrosequencing to generate EST databases of this insect. Results We obtained 201,281 and 218,681 high-quality reads from viruliferous and naïve L. striatellus, respectively, with an average read length as 230 bp. These reads were assembled into contigs and two EST databases were generated. When all reads were combined, 16,885 contigs and 24,607 singletons (a total of 41,492 unigenes) were obtained, which represents a transcriptome of the insect. BlastX search against the NCBI-NR database revealed that only 6,873 (16.6%) of these unigenes have significant matches. Comparison of the distribution of GO classification among viruliferous, naïve, and combined EST databases indicated that these libraries are broadly representative of the L. striatellus transcriptomes. Functionally diverse transcripts from RSV, endosymbiotic bacteria Wolbachia and yeast-like symbiotes were identified, which reflects the possible lifestyles of these microbial symbionts that live in the cells of the host insect. Comparative genomic analysis revealed that L. striatellus encodes similar innate immunity regulatory systems as other insects, such as RNA interference, JAK/STAT and partial Imd cascades, which might be involved in defense against viral infection. In addition, we determined the differences in gene expression between vector and naïve samples, which generated a list of candidate genes that are potentially involved in the symbiosis of L. striatellus and RSV. Conclusions To our knowledge, the present study is the first description of a genomic project for L. striatellus. The identification of transcripts from RSV, Wolbachia, yeast-like symbiotes and genes abundantly expressed in viruliferous insect, provided a starting-point for investigating the molecular basis of symbiosis among these organisms. PMID:20462456

  12. Construction of a medicinal leech transcriptome database and its application to the identification of leech homologs of neural and innate immune genes.

    PubMed

    Macagno, Eduardo R; Gaasterland, Terry; Edsall, Lee; Bafna, Vineet; Soares, Marcelo B; Scheetz, Todd; Casavant, Thomas; Da Silva, Corinne; Wincker, Patrick; Tasiemski, Aurélie; Salzet, Michel

    2010-06-25

    The medicinal leech, Hirudo medicinalis, is an important model system for the study of nervous system structure, function, development, regeneration and repair. It is also a unique species in being presently approved for use in medical procedures, such as clearing of pooled blood following certain surgical procedures. It is a current, and potentially also future, source of medically useful molecular factors, such as anticoagulants and antibacterial peptides, which may have evolved as a result of its parasitizing large mammals, including humans. Despite the broad focus of research on this system, little has been done at the genomic or transcriptomic levels and there is a paucity of openly available sequence data. To begin to address this problem, we constructed whole embryo and adult central nervous system (CNS) EST libraries and created a clustered sequence database of the Hirudo transcriptome that is available to the scientific community. A total of approximately 133,000 EST clones from two directionally-cloned cDNA libraries, one constructed from mRNA derived from whole embryos at several developmental stages and the other from adult CNS cords, were sequenced in one or both directions by three different groups: Genoscope (French National Sequencing Center), the University of Iowa Sequencing Facility and the DOE Joint Genome Institute. These were assembled using the phrap software package into 31,232 unique contigs and singletons, with an average length of 827 nt. The assembled transcripts were then translated in all six frames and compared to proteins in NCBI's non-redundant (NR) and to the Gene Ontology (GO) protein sequence databases, resulting in 15,565 matches to 11,236 proteins in NR and 13,935 matches to 8,073 proteins in GO. Searching the database for transcripts of genes homologous to those thought to be involved in the innate immune responses of vertebrates and other invertebrates yielded a set of nearly one hundred evolutionarily conserved sequences, representing all known pathways involved in these important functions. The sequences obtained for Hirudo transcripts represent the first major database of genes expressed in this important model system. Comparison of translated open reading frames (ORFs) with the other openly available leech datasets, the genome and transcriptome of Helobdella robusta, shows an average identity at the amino acid level of 58% in matched sequences. Interestingly, comparison with other available Lophotrochozoans shows similar high levels of amino acid identity, where sequences match, for example, 64% with Capitella capitata (a polychaete) and 56% with Aplysia californica (a mollusk), as well as 58% with Schistosoma mansoni (a platyhelminth). Phylogenetic comparisons of putative Hirudo innate immune response genes present within the Hirudo transcriptome database herein described show a strong resemblance to the corresponding mammalian genes, indicating that this important physiological response may have older origins than what has been previously proposed.

  13. Detection of canonical A-to-G editing events at 3′ UTRs and microRNA target sites in human lungs using next-generation sequencing

    PubMed Central

    Soundararajan, Ramani; Stearns, Timothy M.; Griswold, Anthony J.; Mehta, Arpit; Czachor, Alexander; Fukumoto, Jutaro; Lockey, Richard F.; King, Benjamin L.; Kolliputi, Narasaiah

    2015-01-01

    RNA editing is a post-transcriptional modification of RNA. The majority of these changes result from adenosine deaminase acting on RNA (ADARs) catalyzing the conversion of adenosine residues to inosine in double-stranded RNAs (dsRNAs). Massively parallel sequencing has enabled the identification of RNA editing sites in human transcriptomes. In this study, we sequenced DNA and RNA from human lungs and identified RNA editing sites with high confidence via a computational pipeline utilizing stringent analysis thresholds. We identified a total of 3,447 editing sites that overlapped in three human lung samples, and with 50% of these sites having canonical A-to-G base changes. Approximately 27% of the edited sites overlapped with Alu repeats, and showed A-to-G clustering (>3 clusters in 100 bp). The majority of edited sites mapped to either 3′ untranslated regions (UTRs) or introns close to splice sites; whereas, only few sites were in exons resulting in non-synonymous amino acid changes. Interestingly, we identified 652 A-to-G editing events in the 3′ UTR of 205 target genes that mapped to 932 potential miRNA target binding sites. Several of these miRNA edited sites were validated in silico. Additionally, we validated several A-to-G edited sites by Sanger sequencing. Altogether, our study suggests a role for RNA editing in miRNA-mediated gene regulation and splicing in human lungs. In this study, we have generated a RNA editome of human lung tissue that can be compared with other RNA editomes across different lung tissues to delineate a role for RNA editing in normal and diseased states. PMID:26486088

  14. Detection of canonical A-to-G editing events at 3' UTRs and microRNA target sites in human lungs using next-generation sequencing.

    PubMed

    Soundararajan, Ramani; Stearns, Timothy M; Griswold, Anthony L; Mehta, Arpit; Czachor, Alexander; Fukumoto, Jutaro; Lockey, Richard F; King, Benjamin L; Kolliputi, Narasaiah

    2015-11-03

    RNA editing is a post-transcriptional modification of RNA. The majority of these changes result from adenosine deaminase acting on RNA (ADARs) catalyzing the conversion of adenosine residues to inosine in double-stranded RNAs (dsRNAs). Massively parallel sequencing has enabled the identification of RNA editing sites in human transcriptomes. In this study, we sequenced DNA and RNA from human lungs and identified RNA editing sites with high confidence via a computational pipeline utilizing stringent analysis thresholds. We identified a total of 3,447 editing sites that overlapped in three human lung samples, and with 50% of these sites having canonical A-to-G base changes. Approximately 27% of the edited sites overlapped with Alu repeats, and showed A-to-G clustering (>3 clusters in 100 bp). The majority of edited sites mapped to either 3' untranslated regions (UTRs) or introns close to splice sites; whereas, only few sites were in exons resulting in non-synonymous amino acid changes. Interestingly, we identified 652 A-to-G editing events in the 3' UTR of 205 target genes that mapped to 932 potential miRNA target binding sites. Several of these miRNA edited sites were validated in silico. Additionally, we validated several A-to-G edited sites by Sanger sequencing. Altogether, our study suggests a role for RNA editing in miRNA-mediated gene regulation and splicing in human lungs. In this study, we have generated a RNA editome of human lung tissue that can be compared with other RNA editomes across different lung tissues to delineate a role for RNA editing in normal and diseased states.

  15. Digital gene expression analysis of the zebra finch genome

    PubMed Central

    2010-01-01

    Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates. PMID:20359325

  16. JANE: efficient mapping of prokaryotic ESTs and variable length sequence reads on related template genomes

    PubMed Central

    2009-01-01

    Background ESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome. Results The tool JANE rapidly maps ESTs or variable sequence reads in prokaryotic sequencing and transcriptome efforts to related template genomes. It provides an easy-to-use graphics interface for information retrieval and a toolkit for EST or nucleotide sequence function prediction. Furthermore, we developed for rapid mapping an enhanced sequence alignment algorithm which reassembles and evaluates high scoring pairs provided from the BLAST algorithm. Rapid assembly on and replacement of the template genome by sequence reads or mapped ESTs is achieved. This is illustrated (i) by data from Staphylococci as well as from a Blattabacteria sequencing effort, (ii) mapping single cell sequencing reads is shown for poribacteria to sister phylum representative Rhodopirellula Baltica SH1. The algorithm has been implemented in a web-server accessible at http://jane.bioapps.biozentrum.uni-wuerzburg.de. Conclusion Rapid prokaryotic EST mapping or mapping of sequence reads is achieved applying JANE even without knowing the cognate genome sequence. PMID:19943962

  17. Massively Parallel Sequencing Detected a Mutation in the MFN2 Gene Missed by Sanger Sequencing Due to a Primer Mismatch on an SNP Site.

    PubMed

    Neupauerová, Jana; Grečmalová, Dagmar; Seeman, Pavel; Laššuthová, Petra

    2016-05-01

    We describe a patient with early onset severe axonal Charcot-Marie-Tooth disease (CMT2) with dominant inheritance, in whom Sanger sequencing failed to detect a mutation in the mitofusin 2 (MFN2) gene because of a single nucleotide polymorphism (rs2236057) under the PCR primer sequence. The severe early onset phenotype and the family history with severely affected mother (died after delivery) was very suggestive of CMT2A and this suspicion was finally confirmed by a MFN2 mutation. The mutation p.His361Tyr was later detected in the patient by massively parallel sequencing with a gene panel for hereditary neuropathies. According to this information, new primers for amplification and sequencing were designed which bind away from the polymorphic sites of the patient's DNA. Sanger sequencing with these new primers then confirmed the heterozygous mutation in the MFN2 gene in this patient. This case report shows that massively parallel sequencing may in some rare cases be more sensitive than Sanger sequencing and highlights the importance of accurate primer design which requires special attention. © 2016 John Wiley & Sons Ltd/University College London.

  18. Gene discovery in Boophilus microplus, the cattle tick: the transcriptomes of ovaries, salivary glands, and hemocytes.

    PubMed

    Santos, Isabel K F de Miranda; Valenzuela, Jesus G; Ribeiro, José Marcos C; de Castro, Marilia; Costa, Juliana Nardelli; Costa, Ana Maria; da Silva, Edson Ramiro; Neto, Olavo Bilac Rego; Rocha, Clarisse; Daffre, Sirlei; Ferreira, Beatriz R; da Silva, João Santana; Szabó, Matias Pablo; Bechara, Gervasio Henrique

    2004-10-01

    The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.

  19. AmpuBase: a transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae).

    PubMed

    Ip, Jack C H; Mu, Huawei; Chen, Qian; Sun, Jin; Ituarte, Santiago; Heras, Horacio; Van Bocxlaer, Bert; Ganmanee, Monthon; Huang, Xin; Qiu, Jian-Wen

    2018-03-05

    Gastropoda, with approximately 80,000 living species, is the largest class of Mollusca. Among gastropods, apple snails (family Ampullariidae) are globally distributed in tropical and subtropical freshwater ecosystems and many species are ecologically and economically important. Ampullariids exhibit various morphological and physiological adaptations to their respective habitats, which make them ideal candidates for studying adaptation, population divergence, speciation, and larger-scale patterns of diversity, including the biogeography of native and invasive populations. The limited availability of genomic data, however, hinders in-depth ecological and evolutionary studies of these non-model organisms. Using Illumina Hiseq platforms, we sequenced 1220 million reads for seven species of apple snails. Together with the previously published RNA-Seq data of two apple snails, we conducted de novo transcriptome assembly of eight species that belong to five genera of Ampullariidae, two of which represent Old World lineages and the other three New World lineages. There were 20,730 to 35,828 unigenes with predicted open reading frames for the eight species, with N50 (shortest sequence length at 50% of the unigenes) ranging from 1320 to 1803 bp. 69.7% to 80.2% of these unigenes were functionally annotated by searching against NCBI's non-redundant, Gene Ontology database and the Kyoto Encyclopaedia of Genes and Genomes. With these data we developed AmpuBase, a relational database that features online BLAST functionality for DNA/protein sequences, keyword searching for unigenes/functional terms, and download functions for sequences and whole transcriptomes. In summary, we have generated comprehensive transcriptome data for multiple ampullariid genera and species, and created a publicly accessible database with a user-friendly interface to facilitate future basic and applied studies on ampullariids, and comparative molecular studies with other invertebrates.

  20. Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus)

    PubMed Central

    Pan, Lei; Liu, Yan; Wei, Qiang; Xiao, Chenwen; Ji, Quanan; Bao, Guolian; Wu, Xinsheng

    2015-01-01

    Background Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination. Methodology/Principal Findings Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt). Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO), Clusters of Orthologous Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG), we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit. Conclusions/Significance The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development. PMID:25955442

Top