Sample records for genome scale transcriptome

  1. Genome scale transcriptomics of baculovirus-insect interactions.

    PubMed

    Nguyen, Quan; Nielsen, Lars K; Reid, Steven

    2013-11-12

    Baculovirus-insect cell technologies are applied in the production of complex proteins, veterinary and human vaccines, gene delivery vectors' and biopesticides. Better understanding of how baculoviruses and insect cells interact would facilitate baculovirus-based production. While complete genomic sequences are available for over 58 baculovirus species, little insect genomic information is known. The release of the Bombyx mori and Plutella xylostella genomes, the accumulation of EST sequences for several Lepidopteran species, and especially the availability of two genome-scale analysis tools, namely oligonucleotide microarrays and next generation sequencing (NGS), have facilitated expression studies to generate a rich picture of insect gene responses to baculovirus infections. This review presents current knowledge on the interaction dynamics of the baculovirus-insect system' which is relatively well studied in relation to nucleocapsid transportation, apoptosis, and heat shock responses, but is still poorly understood regarding responses involved in pro-survival pathways, DNA damage pathways, protein degradation, translation, signaling pathways, RNAi pathways, and importantly metabolic pathways for energy, nucleotide and amino acid production. We discuss how the two genome-scale transcriptomic tools can be applied for studying such pathways and suggest that proteomics and metabolomics can produce complementary findings to transcriptomic studies.

  2. CGDV: a webtool for circular visualization of genomics and transcriptomics data.

    PubMed

    Jha, Vineet; Singh, Gulzar; Kumar, Shiva; Sonawane, Amol; Jere, Abhay; Anamika, Krishanpal

    2017-10-24

    Interpretation of large-scale data is very challenging and currently there is scarcity of web tools which support automated visualization of a variety of high throughput genomics and transcriptomics data and for a wide variety of model organisms along with user defined karyotypes. Circular plot provides holistic visualization of high throughput large scale data but it is very complex and challenging to generate as most of the available tools need informatics expertise to install and run them. We have developed CGDV (Circos for Genomics and Transcriptomics Data Visualization), a webtool based on Circos, for seamless and automated visualization of a variety of large scale genomics and transcriptomics data. CGDV takes output of analyzed genomics or transcriptomics data of different formats, such as vcf, bed, xls, tab limited matrix text file, CNVnator raw output and Gene fusion raw output, to plot circular view of the sample data. CGDV take cares of generating intermediate files required for circos. CGDV is freely available at https://cgdv-upload.persistent.co.in/cgdv/ . The circular plot for each data type is tailored to gain best biological insights into the data. The inter-relationship between data points, homologous sequences, genes involved in fusion events, differential expression pattern, sequencing depth, types and size of variations and enrichment of DNA binding proteins can be seen using CGDV. CGDV thus helps biologists and bioinformaticians to visualize a variety of genomics and transcriptomics data seamlessly.

  3. Reptilian Transcriptomes v2.0: An Extensive Resource for Sauropsida Genomics and Transcriptomics

    PubMed Central

    Tzika, Athanasia C.; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C.

    2015-01-01

    Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the “Reptilian Transcriptomes Database 2.0,” which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. PMID:26133641

  4. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  5. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  6. Genome Annotation and Transcriptomics of Oil-Producing Algae

    DTIC Science & Technology

    2015-03-16

    AFRL-OSR-VA-TR-2015-0103 GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE Sabeeha Merchant UNIVERSITY OF CALIFORNIA LOS ANGELES Final...2010 To 12-31-2014 4. TITLE AND SUBTITLE GENOME ANNOTATION AND TRANSCRIPTOMICS OF OIL-PRODUCING ALGAE 5a. CONTRACT NUMBER FA9550-10-1-0095 5b...NOTES 14. ABSTRACT Most algae accumulate triacylglycerols (TAGs) when they are starved for essential nutrients like N, S, P (or Si in the case of some

  7. Plant genome and transcriptome annotations: from misconceptions to simple solutions

    PubMed Central

    Bolger, Marie E; Arsova, Borjana; Usadel, Björn

    2018-01-01

    Abstract Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources. PMID:28062412

  8. Genome-Scale Transcriptome Analysis in Response to Nitric Oxide in Birch Cells: Implications of the Triterpene Biosynthetic Pathway

    PubMed Central

    Zeng, Fansuo; Sun, Fengkun; Li, Leilei; Liu, Kun; Zhan, Yaguang

    2014-01-01

    Evidence supporting nitric oxide (NO) as a mediator of plant biochemistry continues to grow, but its functions at the molecular level remains poorly understood and, in some cases, controversial. To study the role of NO at the transcriptional level in Betula platyphylla cells, we conducted a genome-scale transcriptome analysis of these cells. The transcriptome of untreated birch cells and those treated by sodium nitroprusside (SNP) were analyzed using the Solexa sequencing. Data were collected by sequencing cDNA libraries of birch cells, which had a long period to adapt to the suspension culture conditions before SNP-treated cells and untreated cells were sampled. Among the 34,100 UniGenes detected, BLASTX search revealed that 20,631 genes showed significant (E-values≤10−5) sequence similarity with proteins from the NR-database. Numerous expressed sequence tags (i.e., 1374) were identified as differentially expressed between the 12 h SNP-treated cells and control cells samples: 403 up-regulated and 971 down-regulated. From this, we specifically examined a core set of NO-related transcripts. The altered expression levels of several transcripts, as determined by transcriptome analysis, was confirmed by qRT-PCR. The results of transcriptome analysis, gene expression quantification, the content of triterpenoid and activities of defensive enzymes elucidated NO has a significant effect on many processes including triterpenoid production, carbohydrate metabolism and cell wall biosynthesis. PMID:25551661

  9. Characterization of mango (Mangifera indica L.) transcriptome and chloroplast genome.

    PubMed

    Azim, M Kamran; Khan, Ishtaiq A; Zhang, Yong

    2014-05-01

    We characterized mango leaf transcriptome and chloroplast genome using next generation DNA sequencing. The RNA-seq output of mango transcriptome generated >12 million reads (total nucleotides sequenced >1 Gb). De novo transcriptome assembly generated 30,509 unigenes with lengths in the range of 300 to ≥3,000 nt and 67× depth of coverage. Blast searching against nonredundant nucleotide databases and several Viridiplantae genomic datasets annotated 24,593 mango unigenes (80% of total) and identified Citrus sinensis as closest neighbor of mango with 9,141 (37%) matched sequences. The annotation with gene ontology and Clusters of Orthologous Group terms categorized unigene sequences into 57 and 25 classes, respectively. More than 13,500 unigenes were assigned to 293 KEGG pathways. Besides major plant biology related pathways, KEGG based gene annotation pointed out active presence of an array of biochemical pathways involved in (a) biosynthesis of bioactive flavonoids, flavones and flavonols, (b) biosynthesis of terpenoids and lignins and (c) plant hormone signal transduction. The mango transcriptome sequences revealed 235 proteases belonging to five catalytic classes of proteolytic enzymes. The draft genome of mango chloroplast (cp) was obtained by a combination of Sanger and next generation sequencing. The draft mango cp genome size is 151,173 bp with a pair of inverted repeats of 27,093 bp separated by small and large single copy regions, respectively. Out of 139 genes in mango cp genome, 91 found to be protein coding. Sequence analysis revealed cp genome of C. sinensis as closest neighbor of mango. We found 51 short repeats in mango cp genome supposed to be associated with extensive rearrangements. This is the first report of transcriptome and chloroplast genome analysis of any Anacardiaceae family member.

  10. Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

    PubMed

    Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami

    2018-01-19

    Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .

  11. Draft genome and reference transcriptomic resources for the urticating pine defoliator Thaumetopoea pityocampa (Lepidoptera: Notodontidae).

    PubMed

    Gschloessl, B; Dorkeld, F; Berges, H; Beydon, G; Bouchez, O; Branco, M; Bretaudeau, A; Burban, C; Dubois, E; Gauthier, P; Lhuillier, E; Nichols, J; Nidelet, S; Rocha, S; Sauné, L; Streiff, R; Gautier, M; Kerdelhué, C

    2018-05-01

    The pine processionary moth Thaumetopoea pityocampa (Lepidoptera: Notodontidae) is the main pine defoliator in the Mediterranean region. Its urticating larvae cause severe human and animal health concerns in the invaded areas. This species shows a high phenotypic variability for various traits, such as phenology, fecundity and tolerance to extreme temperatures. This study presents the construction and analysis of extensive genomic and transcriptomic resources, which are an obligate prerequisite to understand their underlying genetic architecture. Using a well-studied population from Portugal with peculiar phenological characteristics, the karyotype was first determined and a first draft genome of 537 Mb total length was assembled into 68,292 scaffolds (N50 = 164 kb). From this genome assembly, 29,415 coding genes were predicted. To circumvent some limitations for fine-scale physical mapping of genomic regions of interest, a 3X coverage BAC library was also developed. In particular, 11 BACs from this library were individually sequenced to assess the assembly quality. Additionally, de novo transcriptomic resources were generated from various developmental stages sequenced with HiSeq and MiSeq Illumina technologies. The reads were de novo assembled into 62,376 and 63,175 transcripts, respectively. Then, a robust subset of the genome-predicted coding genes, the de novo transcriptome assemblies and previously published 454/Sanger data were clustered to obtain a high-quality and comprehensive reference transcriptome consisting of 29,701 bona fide unigenes. These sequences covered 99% of the cegma and 88% of the busco highly conserved eukaryotic genes and 84% of the busco arthropod gene set. Moreover, 90% of these transcripts could be localized on the draft genome. The described information is available via a genome annotation portal (http://bipaa.genouest.org/sp/thaumetopoea_pityocampa/). © 2018 John Wiley & Sons Ltd.

  12. Development of genome- and transcriptome-derived microsatellites in related species of snapping shrimps with highly duplicated genomes.

    PubMed

    Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R

    2017-11-01

    Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.

  13. Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

    PubMed Central

    Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088

  14. Consequences of normalizing transcriptomic and genomic libraries of plant genomes using a duplex-specific nuclease and tetramethylammonium chloride.

    PubMed

    Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

  15. Transcriptome analysis reveals the time of the fourth round of genome duplication in common carp (Cyprinus carpio)

    PubMed Central

    2012-01-01

    Background Common carp (Cyprinus carpio) is thought to have undergone one extra round of genome duplication compared to zebrafish. Transcriptome analysis has been used to study the existence and timing of genome duplication in species for which genome sequences are incomplete. Large-scale transcriptome data for the common carp genome should help reveal the timing of the additional duplication event. Results We have sequenced the transcriptome of common carp using 454 pyrosequencing. After assembling the 454 contigs and the published common carp sequences together, we obtained 49,669 contigs and identified genes using homology searches and an ab initio method. We identified 4,651 orthologous pairs between common carp and zebrafish and found 129,984 paralogous pairs within the common carp. An estimation of the synonymous substitution rate in the orthologous pairs indicated that common carp and zebrafish diverged 120 million years ago (MYA). We identified one round of genome duplication in common carp and estimated that it had occurred 5.6 to 11.3 MYA. In zebrafish, no genome duplication event after speciation was observed, suggesting that, compared to zebrafish, common carp had undergone an additional genome duplication event. We annotated the common carp contigs with Gene Ontology terms and KEGG pathways. Compared with zebrafish gene annotations, we found that a set of biological processes and pathways were enriched in common carp. Conclusions The assembled contigs helped us to estimate the time of the fourth-round of genome duplication in common carp. The resource that we have built as part of this study will help advance functional genomics and genome annotation studies in the future. PMID:22424280

  16. Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data.

    PubMed

    Hughes, Lily C; Ortí, Guillermo; Huang, Yu; Sun, Ying; Baldwin, Carole C; Thompson, Andrew W; Arcila, Dahiana; Betancur-R, Ricardo; Li, Chenhong; Becker, Leandro; Bellora, Nicolás; Zhao, Xiaomeng; Li, Xiaofeng; Wang, Min; Fang, Chao; Xie, Bing; Zhou, Zhuocheng; Huang, Hai; Chen, Songlin; Venkatesh, Byrappa; Shi, Qiong

    2018-05-14

    Our understanding of phylogenetic relationships among bony fishes has been transformed by analysis of a small number of genes, but uncertainty remains around critical nodes. Genome-scale inferences so far have sampled a limited number of taxa and genes. Here we leveraged 144 genomes and 159 transcriptomes to investigate fish evolution with an unparalleled scale of data: >0.5 Mb from 1,105 orthologous exon sequences from 303 species, representing 66 out of 72 ray-finned fish orders. We apply phylogenetic tests designed to trace the effect of whole-genome duplication events on gene trees and find paralogy-free loci using a bioinformatics approach. Genome-wide data support the structure of the fish phylogeny, and hypothesis-testing procedures appropriate for phylogenomic datasets using explicit gene genealogy interrogation settle some long-standing uncertainties, such as the branching order at the base of the teleosts and among early euteleosts, and the sister lineage to the acanthomorph and percomorph radiations. Comprehensive fossil calibrations date the origin of all major fish lineages before the end of the Cretaceous.

  17. Genomics, transcriptomics and proteomics to elucidate the pathogenesis of rheumatoid arthritis.

    PubMed

    Song, Xinqiang; Lin, Qingsong

    2017-08-01

    Rheumatoid arthritis is an autoimmune disease that affects several organs and tissues, predominantly the synovial joints. The pathogenesis of this disease is not completely understood, which maybe involved in the genomic variations, gene expression, protein translation and post-translational modifications. These system variations in genomics, transcriptomics and proteomics are dynamic in nature and their crosstalk is overwhelmingly complex, thus analyzing them separately may not be very informative. However, various '-omics' techniques developed in recent years have opened up new possibilities for clarifying disease pathways and thereby facilitating early diagnosis and specific therapies. This review examines how recent advances in the fields of genomics, transcriptomics and proteomics have contributed to our understanding of rheumatoid arthritis.

  18. International Standards for Genomes, Transcriptomes, and Metagenomes

    PubMed Central

    Mason, Christopher E.; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-01-01

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine. PMID:28337071

  19. Strand-specific transcriptome profiling with directly labeled RNA on genomic tiling microarrays

    PubMed Central

    2011-01-01

    Background With lower manufacturing cost, high spot density, and flexible probe design, genomic tiling microarrays are ideal for comprehensive transcriptome studies. Typically, transcriptome profiling using microarrays involves reverse transcription, which converts RNA to cDNA. The cDNA is then labeled and hybridized to the probes on the arrays, thus the RNA signals are detected indirectly. Reverse transcription is known to generate artifactual cDNA, in particular the synthesis of second-strand cDNA, leading to false discovery of antisense RNA. To address this issue, we have developed an effective method using RNA that is directly labeled, thus by-passing the cDNA generation. This paper describes this method and its application to the mapping of transcriptome profiles. Results RNA extracted from laboratory cultures of Porphyromonas gingivalis was fluorescently labeled with an alkylation reagent and hybridized directly to probes on genomic tiling microarrays specifically designed for this periodontal pathogen. The generated transcriptome profile was strand-specific and produced signals close to background level in most antisense regions of the genome. In contrast, high levels of signal were detected in the antisense regions when the hybridization was done with cDNA. Five antisense areas were tested with independent strand-specific RT-PCR and none to negligible amplification was detected, indicating that the strong antisense cDNA signals were experimental artifacts. Conclusions An efficient method was developed for mapping transcriptome profiles specific to both coding strands of a bacterial genome. This method chemically labels and uses extracted RNA directly in microarray hybridization. The generated transcriptome profile was free of cDNA artifactual signals. In addition, this method requires fewer processing steps and is potentially more sensitive in detecting small amount of RNA compared to conventional end-labeling methods due to the incorporation of more

  20. Bacillus anthracis genome organization in light of whole transcriptome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martin, Jeffrey; Zhu, Wenhan; Passalacqua, Karla D.

    2010-03-22

    Emerging knowledge of whole prokaryotic transcriptomes could validate a number of theoretical concepts introduced in the early days of genomics. What are the rules connecting gene expression levels with sequence determinants such as quantitative scores of promoters and terminators? Are translation efficiency measures, e.g. codon adaptation index and RBS score related to gene expression? We used the whole transcriptome shotgun sequencing of a bacterial pathogen Bacillus anthracis to assess correlation of gene expression level with promoter, terminator and RBS scores, codon adaptation index, as well as with a new measure of gene translational efficiency, average translation speed. We compared computationalmore » predictions of operon topologies with the transcript borders inferred from RNA-Seq reads. Transcriptome mapping may also improve existing gene annotation. Upon assessment of accuracy of current annotation of protein-coding genes in the B. anthracis genome we have shown that the transcriptome data indicate existence of more than a hundred genes missing in the annotation though predicted by an ab initio gene finder. Interestingly, we observed that many pseudogenes possess not only a sequence with detectable coding potential but also promoters that maintain transcriptional activity.« less

  1. Primary analysis of repeat elements of the Asian seabass (Lates calcarifer) transcriptome and genome

    PubMed Central

    Kuznetsova, Inna S.; Thevasagayam, Natascha M.; Sridatta, Prakki S. R.; Komissarov, Aleksey S.; Saju, Jolly M.; Ngoh, Si Y.; Jiang, Junhui; Shen, Xueyan; Orbán, László

    2014-01-01

    As part of our Asian seabass genome project, we are generating an inventory of repeat elements in the genome and transcriptome. The karyotype showed a diploid number of 2n = 24 chromosomes with a variable number of B-chromosomes. The transcriptome and genome of Asian seabass were searched for repetitive elements with experimental and bioinformatics tools. Six different types of repeats constituting 8–14% of the genome were characterized. Repetitive elements were clustered in the pericentromeric heterochromatin of all chromosomes, but some of them were preferentially accumulated in pretelomeric and pericentromeric regions of several chromosomes pairs and have chromosomes specific arrangement. From the dispersed class of fish-specific non-LTR retrotransposon elements Rex1 and MAUI-like repeats were analyzed. They were wide-spread both in the genome and transcriptome, accumulated on the pericentromeric and peritelomeric areas of all chromosomes. Every analyzed repeat was represented in the Asian seabass transcriptome, some showed differential expression between the gonads. The other group of repeats analyzed belongs to the rRNA multigene family. FISH signal for 5S rDNA was located on a single pair of chromosomes, whereas that for 18S rDNA was found on two pairs. A BAC-derived contig containing rDNA was sequenced and assembled into a scaffold containing incomplete fragments of 18S rDNA. Their assembly and chromosomal position revealed that this part of Asian seabass genome is extremely rich in repeats containing evolutionarily conserved and novel sequences. In summary, transcriptome assemblies and cDNA data are suitable for the identification of repetitive DNA from unknown genomes and for comparative investigation of conserved elements between teleosts and other vertebrates. PMID:25120555

  2. Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader.

    PubMed

    Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A

    2018-01-01

    Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.

  3. Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader

    PubMed Central

    Reid, Jack G; Crossland, Michael R

    2018-01-01

    Abstract Background Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Findings Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. Conclusions This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. PMID:29186423

  4. The draft genome and transcriptome of Cannabis sativa.

    PubMed

    van Bakel, Harm; Stout, Jake M; Cote, Atina G; Tallon, Carling M; Sharpe, Andrew G; Hughes, Timothy R; Page, Jonathan E

    2011-10-20

    Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored. We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp. The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics.

  5. Genomic and transcriptomic predictors of triglyceride response to regular exercise

    PubMed Central

    Sarzynski, Mark A; Davidsen, Peter K; Sung, Yun Ju; Hesselink, Matthijs K C; Schrauwen, Patrick; Rice, Treva K; Rao, D C; Falciani, Francesco; Bouchard, Claude

    2015-01-01

    Aim We performed genome-wide and transcriptome-wide profiling to identify genes and single nucleotide polymorphisms (SNPs) associated with the response of triglycerides (TG) to exercise training. Methods Plasma TG levels were measured before and after a 20-week endurance training programme in 478 white participants from the HERITAGE Family Study. Illumina HumanCNV370-Quad v3.0 BeadChips were genotyped using the Illumina BeadStation 500GX platform. Affymetrix HG-U133+2 arrays were used to quantitate gene expression levels from baseline muscle biopsies of a subset of participants (N=52). Genome-wide association study (GWAS) analysis was performed using MERLIN, while transcriptomic predictor models were developed using the R-package GALGO. Results The GWAS results showed that eight SNPs were associated with TG training-response (ΔTG) at p<9.9×10−6, while another 31 SNPs showed p values <1×10−4. In multivariate regression models, the top 10 SNPs explained 32.0% of the variance in ΔTG, while conditional heritability analysis showed that four SNPs statistically accounted for all of the heritability of ΔTG. A molecular signature based on the baseline expression of 11 genes predicted 27% of ΔTG in HERITAGE, which was validated in an independent study. A composite SNP score based on the top four SNPs, each from the genomic and transcriptomic analyses, was the strongest predictor of ΔTG (R2=0.14, p=3.0×10−68). Conclusions Our results indicate that skeletal muscle transcript abundance at 11 genes and SNPs at a number of loci contribute to TG response to exercise training. Combining data from genomics and transcriptomics analyses identified a SNP-based gene signature that should be further tested in independent samples. PMID:26491034

  6. Unstable genomes elevate transcriptome dynamics

    PubMed Central

    Stevens, Joshua B.; Liu, Guo; Abdallah, Batoul Y.; Horne, Steven D.; Ye, Karen J.; Bremer, Steven W.; Ye, Christine J.; Krawetz, Stephen A.; Heng, Henry H.

    2015-01-01

    The challenge of identifying common expression signatures in cancer is well known, however the reason behind this is largely unclear. Traditionally variation in expression signatures has been attributed to technological problems, however recent evidence suggests that chromosome instability (CIN) and resultant karyotypic heterogeneity may be a large contributing factor. Using a well-defined model of immortalization, we systematically compared the pattern of genome alteration and expression dynamics during somatic evolution. Co-measurement of global gene expression and karyotypic alteration throughout the immortalization process reveals that karyotype changes influence gene expression as major structural and numerical karyotypic alterations result in large gene expression deviation. Replicate samples from stages with stable genomes are more similar to each other than are replicate samples with karyotypic heterogeneity. Karyotypic and gene expression change during immortalization is dynamic as each stage of progression has a unique expression pattern. This was further verified by comparing global expression in two replicates grown in one flask with known karyotypes. Replicates with higher karyotypic instability were found to be less similar than replicates with stable karyotypes. This data illustrates the karyotype, transcriptome, and transcriptome determined pathways are in constant flux during somatic cellular evolution (particularly during the macroevolutionary phase) and this flux is an inextricable feature of CIN and essential for cancer formation. The findings presented here underscore the importance of understanding the evolutionary process of cancer in order to design improved treatment modalities. PMID:24122714

  7. Comparative Genomics and Transcriptomics Analyses Reveal Divergent Lifestyle Features of Nematode Endoparasitic Fungus Hirsutella minnesotensis

    PubMed Central

    Lai, Yiling; Liu, Keke; Zhang, Xinyu; Zhang, Xiaoling; Li, Kuan; Wang, Niuniu; Shu, Chi; Wu, Yunpeng; Wang, Chengshu; Bushley, Kathryn E.; Xiang, Meichun; Liu, Xingzhong

    2014-01-01

    Hirsutella minnesotensis [Ophiocordycipitaceae (Hypocreales, Ascomycota)] is a dominant endoparasitic fungus by using conidia that adhere to and penetrate the secondary stage juveniles of soybean cyst nematode. Its genome was de novo sequenced and compared with five entomopathogenic fungi in the Hypocreales and three nematode-trapping fungi in the Orbiliales (Ascomycota). The genome of H. minnesotensis is 51.4 Mb and encodes 12,702 genes enriched with transposable elements up to 32%. Phylogenomic analysis revealed that H. minnesotensis was diverged from entomopathogenic fungi in Hypocreales. Genome of H. minnesotensis is similar to those of entomopathogenic fungi to have fewer genes encoding lectins for adhesion and glycoside hydrolases for cellulose degradation, but is different from those of nematode-trapping fungi to possess more genes for protein degradation, signal transduction, and secondary metabolism. Those results indicate that H. minnesotensis has evolved different mechanism for nematode endoparasitism compared with nematode-trapping fungi. Transcriptomics analyses for the time-scale parasitism revealed the upregulations of lectins, secreted proteases and the genes for biosynthesis of secondary metabolites that could be putatively involved in host surface adhesion, cuticle degradation, and host manipulation. Genome and transcriptome analyses provided comprehensive understanding of the evolution and lifestyle of nematode endoparasitism. PMID:25359922

  8. Computational analysis of conserved RNA secondary structure in transcriptomes and genomes.

    PubMed

    Eddy, Sean R

    2014-01-01

    Transcriptomics experiments and computational predictions both enable systematic discovery of new functional RNAs. However, many putative noncoding transcripts arise instead from artifacts and biological noise, and current computational prediction methods have high false positive rates. I discuss prospects for improving computational methods for analyzing and identifying functional RNAs, with a focus on detecting signatures of conserved RNA secondary structure. An interesting new front is the application of chemical and enzymatic experiments that probe RNA structure on a transcriptome-wide scale. I review several proposed approaches for incorporating structure probing data into the computational prediction of RNA secondary structure. Using probabilistic inference formalisms, I show how all these approaches can be unified in a well-principled framework, which in turn allows RNA probing data to be easily integrated into a wide range of analyses that depend on RNA secondary structure inference. Such analyses include homology search and genome-wide detection of new structural RNAs.

  9. De novo assembling and primary analysis of genome and transcriptome of gray whale Eschrichtius robustus.

    PubMed

    Moskalev, Alexey А; Kudryavtseva, Anna V; Graphodatsky, Alexander S; Beklemisheva, Violetta R; Serdyukova, Natalya A; Krutovsky, Konstantin V; Sharov, Vadim V; Kulakovskiy, Ivan V; Lando, Andrey S; Kasianov, Artem S; Kuzmin, Dmitry A; Putintseva, Yuliya A; Feranchuk, Sergey I; Shaposhnikov, Mikhail V; Fraifeld, Vadim E; Toren, Dmitri; Snezhkina, Anastasia V; Sitnik, Vasily V

    2017-12-28

    Gray whale, Eschrichtius robustus (E. robustus), is a single member of the family Eschrichtiidae, which is considered to be the most primitive in the class Cetacea. Gray whale is often described as a "living fossil". It is adapted to extreme marine conditions and has a high life expectancy (77 years). The assembly of a gray whale genome and transcriptome will allow to carry out further studies of whale evolution, longevity, and resistance to extreme environment. In this work, we report the first de novo assembly and primary analysis of the E. robustus genome and transcriptome based on kidney and liver samples. The presented draft genome assembly is complete by 55% in terms of a total genome length, but only by 24% in terms of the BUSCO complete gene groups, although 10,895 genes were identified. Transcriptome annotation and comparison with other whale species revealed robust expression of DNA repair and hypoxia-response genes, which is expected for whales. This preliminary study of the gray whale genome and transcriptome provides new data to better understand the whale evolution and the mechanisms of their adaptation to the hypoxic conditions.

  10. The draft genome and transcriptome of Cannabis sativa

    PubMed Central

    2011-01-01

    Background Cannabis sativa has been cultivated throughout human history as a source of fiber, oil and food, and for its medicinal and intoxicating properties. Selective breeding has produced cannabis plants for specific uses, including high-potency marijuana strains and hemp cultivars for fiber and seed production. The molecular biology underlying cannabinoid biosynthesis and other traits of interest is largely unexplored. Results We sequenced genomic DNA and RNA from the marijuana strain Purple Kush using shortread approaches. We report a draft haploid genome sequence of 534 Mb and a transcriptome of 30,000 genes. Comparison of the transcriptome of Purple Kush with that of the hemp cultivar 'Finola' revealed that many genes encoding proteins involved in cannabinoid and precursor pathways are more highly expressed in Purple Kush than in 'Finola'. The exclusive occurrence of Δ9-tetrahydrocannabinolic acid synthase in the Purple Kush transcriptome, and its replacement by cannabidiolic acid synthase in 'Finola', may explain why the psychoactive cannabinoid Δ9-tetrahydrocannabinol (THC) is produced in marijuana but not in hemp. Resequencing the hemp cultivars 'Finola' and 'USO-31' showed little difference in gene copy numbers of cannabinoid pathway enzymes. However, single nucleotide variant analysis uncovered a relatively high level of variation among four cannabis types, and supported a separation of marijuana and hemp. Conclusions The availability of the Cannabis sativa genome enables the study of a multifunctional plant that occupies a unique role in human culture. Its availability will aid the development of therapeutic marijuana strains with tailored cannabinoid profiles and provide a basis for the breeding of hemp with improved agronomic characteristics. PMID:22014239

  11. KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.

    PubMed

    Jouraku, Akiya; Yamamoto, Kimiko; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Narukawa, Junko; Miyamoto, Kazuhisa; Kurita, Kanako; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Noda, Hiroaki

    2013-07-09

    The diamondback moth (DBM), Plutella xylostella, is one of the most harmful insect pests for crucifer crops worldwide. DBM has rapidly evolved high resistance to most conventional insecticides such as pyrethroids, organophosphates, fipronil, spinosad, Bacillus thuringiensis, and diamides. Therefore, it is important to develop genomic and transcriptomic DBM resources for analysis of genes related to insecticide resistance, both to clarify the mechanism of resistance of DBM and to facilitate the development of insecticides with a novel mode of action for more effective and environmentally less harmful insecticide rotation. To contribute to this goal, we developed KONAGAbase, a genomic and transcriptomic database for DBM (KONAGA is the Japanese word for DBM). KONAGAbase provides (1) transcriptomic sequences of 37,340 ESTs/mRNAs and 147,370 RNA-seq contigs which were clustered and assembled into 84,570 unigenes (30,695 contigs, 50,548 pseudo singletons, and 3,327 singletons); and (2) genomic sequences of 88,530 WGS contigs with 246,244 degenerate contigs and 106,455 singletons from which 6,310 de novo identified repeat sequences and 34,890 predicted gene-coding sequences were extracted. The unigenes and predicted gene-coding sequences were clustered and 32,800 representative sequences were extracted as a comprehensive putative gene set. These sequences were annotated with BLAST descriptions, Gene Ontology (GO) terms, and Pfam descriptions, respectively. KONAGAbase contains rich graphical user interface (GUI)-based web interfaces for easy and efficient searching, browsing, and downloading sequences and annotation data. Five useful search interfaces consisting of BLAST search, keyword search, BLAST result-based search, GO tree-based search, and genome browser are provided. KONAGAbase is publicly available from our website (http://dbm.dna.affrc.go.jp/px/) through standard web browsers. KONAGAbase provides DBM comprehensive transcriptomic and draft genomic sequences with

  12. A large-scale full-length cDNA analysis to explore the budding yeast transcriptome

    PubMed Central

    Miura, Fumihito; Kawaguchi, Noriko; Sese, Jun; Toyoda, Atsushi; Hattori, Masahira; Morishita, Shinichi; Ito, Takashi

    2006-01-01

    We performed a large-scale cDNA analysis to explore the transcriptome of the budding yeast Saccharomyces cerevisiae. We sequenced two cDNA libraries, one from the cells exponentially growing in a minimal medium and the other from meiotic cells. Both libraries were generated by using a vector-capping method that allows the accurate mapping of transcription start sites (TSSs). Consequently, we identified 11,575 TSSs associated with 3,638 annotated genomic features, including 3,599 ORFs, to suggest that most yeast genes have two or more TSSs. In addition, we identified 45 previously undescribed introns, including those affecting current ORF annotations and those spliced alternatively. Furthermore, the analysis revealed 667 transcription units in the intergenic regions and transcripts derived from antisense strands of 367 known features. We also found that 348 ORFs carry TSSs in their 3′-halves to generate sense transcripts starting from inside the ORFs. These results indicate that the budding yeast transcriptome is considerably more complex than previously thought, and it shares many recently revealed characteristics with the transcriptomes of mammals and other higher eukaryotes. Thus, the genome-wide active transcription that generates novel classes of transcripts appears to be an intrinsic feature of the eukaryotic cells. The budding yeast will serve as a versatile model for the studies on these aspects of transcriptome, and the full-length cDNA clones can function as an invaluable resource in such studies. PMID:17101987

  13. Swine transcriptome characterization by combined Iso-Seq and RNA-seq for annotating the emerging long read-based reference genome

    USDA-ARS?s Scientific Manuscript database

    PacBio long-read sequencing technology is increasingly popular in genome sequence assembly and transcriptome cataloguing. Recently, a new-generation pig reference genome was assembled based on long reads from this technology. To finely annotate this genome assembly, transcriptomes of nine tissues fr...

  14. High-confidence coding and noncoding transcriptome maps

    PubMed Central

    2017-01-01

    The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap 2.0, The Cancer Genome Atlas, and GTEx projects, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalog that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of noncoding genomes. PMID:28396519

  15. Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California

    NASA Astrophysics Data System (ADS)

    Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.

    2016-02-01

    Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.

  16. [Genomics and transcriptomics of the Chinese liver fluke Clonorchis sinensis (Opisthorchiidae, Trematoda)].

    PubMed

    Chelomina, G N

    2017-01-01

    The review summarizes the results of first genomic and transcriptomic investigations of the liver fluke Clonorchis sinensis (Opisthorchiidae, Trematoda). The studies mark the dawn of the genomic era for opisthorchiids, which cause severe hepatobiliary diseases in humans and animals. Their results aided in understanding the molecular mechanisms of adaptation to parasitism, parasite survival in mammalian biliary tracts, and genome dynamics in the individual development and the development of parasite-host relationships. Special attention is paid to the achievements in studying the codon usage bias and the roles of mobile genetic elements (MGEs) and small interfering RNAs (siRNAs). Interspecific comparisons at the genomic and transcriptomic levels revealed molecular differences, which may contribute to understanding the specialized niches and physiological needs of the respective species. The studies in C. sinensis provide a basis for further basic and applied research in liver flukes and, in particular, the development of efficient means to prevent, diagnose, and treat clonorchiasis.

  17. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    PubMed Central

    Seaver, Samuel M. D.; Bradbury, Louis M. T.; Frelin, Océane; Zarecki, Raphy; Ruppin, Eytan; Hanson, Andrew D.; Henry, Christopher S.

    2015-01-01

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions and possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes. PMID:25806041

  18. Improved evidence-based genome-scale metabolic models for maize leaf, embryo, and endosperm

    DOE PAGES

    Seaver, Samuel M.D.; Bradbury, Louis M.T.; Frelin, Océane; ...

    2015-03-10

    There is a growing demand for genome-scale metabolic reconstructions for plants, fueled by the need to understand the metabolic basis of crop yield and by progress in genome and transcriptome sequencing. Methods are also required to enable the interpretation of plant transcriptome data to study how cellular metabolic activity varies under different growth conditions or even within different organs, tissues, and developmental stages. Such methods depend extensively on the accuracy with which genes have been mapped to the biochemical reactions in the plant metabolic pathways. Errors in these mappings lead to metabolic reconstructions with an inflated number of reactions andmore » possible generation of unreliable metabolic phenotype predictions. Here we introduce a new evidence-based genome-scale metabolic reconstruction of maize, with significant improvements in the quality of the gene-reaction associations included within our model. We also present a new approach for applying our model to predict active metabolic genes based on transcriptome data. This method includes a minimal set of reactions associated with low expression genes to enable activity of a maximum number of reactions associated with high expression genes. We apply this method to construct an organ-specific model for the maize leaf, and tissue specific models for maize embryo and endosperm cells. We validate our models using fluxomics data for the endosperm and embryo, demonstrating an improved capacity of our models to fit the available fluxomics data. All models are publicly available via the DOE Systems Biology Knowledgebase and PlantSEED, and our new method is generally applicable for analysis transcript profiles from any plant, paving the way for further in silico studies with a wide variety of plant genomes.« less

  19. The genome- and transcriptome-wide analysis of innate immunity in the brown planthopper, Nilaparvata lugens

    PubMed Central

    2013-01-01

    Background The brown planthopper (Nilaparvata lugens) is one of the most serious rice plant pests in Asia. N. lugens causes extensive rice damage by sucking rice phloem sap, which results in stunted plant growth and the transmission of plant viruses. Despite the importance of this insect pest, little is known about the immunological mechanisms occurring in this hemimetabolous insect species. Results In this study, we performed a genome- and transcriptome-wide analysis aiming at the immune-related genes. The transcriptome datasets include the N. lugens intestine, the developmental stage, wing formation, and sex-specific expression information that provided useful gene expression sequence data for the genome-wide analysis. As a result, we identified a large number of genes encoding N. lugens pattern recognition proteins, modulation proteins in the prophenoloxidase (proPO) activating cascade, immune effectors, and the signal transduction molecules involved in the immune pathways, including the Toll, Immune deficiency (Imd) and Janus kinase signal transducers and activators of transcription (JAK-STAT) pathways. The genome scale analysis revealed detailed information of the gene structure, distribution and transcription orientations in scaffolds. A comparison of the genome-available hemimetabolous and metabolous insect species indicate the differences in the immune-related gene constitution. We investigated the gene expression profiles with regards to how they responded to bacterial infections and tissue, as well as development and sex expression specificity. Conclusions The genome- and transcriptome-wide analysis of immune-related genes including pattern recognition and modulation molecules, immune effectors, and the signal transduction molecules involved in the immune pathways is an important step in determining the overall architecture and functional network of the immune components in N. lugens. Our findings provide the comprehensive gene sequence resource and

  20. Comprehensive analyses of genomes, transcriptomes and metabolites of neem tree

    PubMed Central

    Rangiah, Kannan; Mahesh, HB; Rajamani, Anantharamanan; Shirke, Meghana D.; Russiachand, Heikham; Loganathan, Ramya Malarini; Shankara Lingu, Chandana; Siddappa, Shilpa; Ramamurthy, Aishwarya; Sathyanarayana, BN

    2015-01-01

    Neem (Azadirachta indica A. Juss) is one of the most versatile tropical evergreen tree species known in India since the Vedic period (1500 BC–600 BC). Neem tree is a rich source of limonoids, having a wide spectrum of activity against insect pests and microbial pathogens. Complex tetranortriterpenoids such as azadirachtin, salanin and nimbin are the major active principles isolated from neem seed. Absolutely nothing is known about the biochemical pathways of these metabolites in neem tree. To identify genes and pathways in neem, we sequenced neem genomes and transcriptomes using next generation sequencing technologies. Assembly of Illumina and 454 sequencing reads resulted in 267 Mb, which accounts for 70% of estimated size of neem genome. We predicted 44,495 genes in the neem genome, of which 32,278 genes were expressed in neem tissues. Neem genome consists about 32.5% (87 Mb) of repetitive DNA elements. Neem tree is phylogenetically related to citrus, Citrus sinensis. Comparative analysis anchored 62% (161 Mb) of assembled neem genomic contigs onto citrus chromomes. Ultrahigh performance liquid chromatography-mass spectrometry-selected reaction monitoring (UHPLC-MS/SRM) method was used to quantify azadirachtin, nimbin, and salanin from neem tissues. Weighted Correlation Network Analysis (WCGNA) of expressed genes and metabolites resulted in identification of possible candidate genes involved in azadirachtin biosynthesis pathway. This study provides genomic, transcriptomic and quantity of top three neem metabolites resource, which will accelerate basic research in neem to understand biochemical pathways. PMID:26290780

  1. Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano.

    PubMed

    Wasik, Kaja; Gurtowski, James; Zhou, Xin; Ramos, Olivia Mendivil; Delás, M Joaquina; Battistoni, Giorgia; El Demerdash, Osama; Falciatori, Ilaria; Vizoso, Dita B; Smith, Andrew D; Ladurner, Peter; Schärer, Lukas; McCombie, W Richard; Hannon, Gregory J; Schatz, Michael

    2015-10-06

    The free-living flatworm, Macrostomum lignano has an impressive regenerative capacity. Following injury, it can regenerate almost an entirely new organism because of the presence of an abundant somatic stem cell population, the neoblasts. This set of unique properties makes many flatworms attractive organisms for studying the evolution of pathways involved in tissue self-renewal, cell-fate specification, and regeneration. The use of these organisms as models, however, is hampered by the lack of a well-assembled and annotated genome sequences, fundamental to modern genetic and molecular studies. Here we report the genomic sequence of M. lignano and an accompanying characterization of its transcriptome. The genome structure of M. lignano is remarkably complex, with ∼75% of its sequence being comprised of simple repeats and transposon sequences. This has made high-quality assembly from Illumina reads alone impossible (N50=222 bp). We therefore generated 130× coverage by long sequencing reads from the Pacific Biosciences platform to create a substantially improved assembly with an N50 of 64 Kbp. We complemented the reference genome with an assembled and annotated transcriptome, and used both of these datasets in combination to probe gene-expression patterns during regeneration, examining pathways important to stem cell function.

  2. Genome sequence and transcriptome analyses of the thermophilic zygomycete fungus Rhizomucor miehei.

    PubMed

    Zhou, Peng; Zhang, Guoqiang; Chen, Shangwu; Jiang, Zhengqiang; Tang, Yanbin; Henrissat, Bernard; Yan, Qiaojuan; Yang, Shaoqing; Chen, Chin-Fu; Zhang, Bing; Du, Zhenglin

    2014-04-21

    The zygomycete fungi like Rhizomucor miehei have been extensively exploited for the production of various enzymes. As a thermophilic fungus, R. miehei is capable of growing at temperatures that approach the upper limits for all eukaryotes. To date, over hundreds of fungal genomes are publicly available. However, Zygomycetes have been rarely investigated both genetically and genomically. Here, we report the genome of R. miehei CAU432 to explore the thermostable enzymatic repertoire of this fungus. The assembled genome size is 27.6-million-base (Mb) with 10,345 predicted protein-coding genes. Even being thermophilic, the G + C contents of fungal whole genome (43.8%) and coding genes (47.4%) are less than 50%. Phylogenetically, R. miehei is more closerly related to Phycomyces blakesleeanus than to Mucor circinelloides and Rhizopus oryzae. The genome of R. miehei harbors a large number of genes encoding secreted proteases, which is consistent with the characteristics of R. miehei being a rich producer of proteases. The transcriptome profile of R. miehei showed that the genes responsible for degrading starch, glucan, protein and lipid were highly expressed. The genome information of R. miehei will facilitate future studies to better understand the mechanisms of fungal thermophilic adaptation and the exploring of the potential of R. miehei in industrial-scale production of thermostable enzymes. Based on the existence of a large repertoire of amylolytic, proteolytic and lipolytic genes in the genome, R. miehei has potential in the production of a variety of such enzymes.

  3. Systems perspectives on erythromycin biosynthesis by comparative genomic and transcriptomic analyses of S. erythraea E3 and NRRL23338 strains

    PubMed Central

    2013-01-01

    Background S. erythraea is a Gram-positive filamentous bacterium used for the industrial-scale production of erythromycin A which is of high clinical importance. In this work, we sequenced the whole genome of a high-producing strain (E3) obtained by random mutagenesis and screening from the wild-type strain NRRL23338, and examined time-series expression profiles of both E3 and NRRL23338. Based on the genomic data and transcriptpmic data of these two strains, we carried out comparative analysis of high-producing strain and wild-type strain at both the genomic level and the transcriptomic level. Results We observed a large number of genetic variants including 60 insertions, 46 deletions and 584 single nucleotide variations (SNV) in E3 in comparison with NRRL23338, and the analysis of time series transcriptomic data indicated that the genes involved in erythromycin biosynthesis and feeder pathways were significantly up-regulated during the 60 hours time-course. According to our data, BldD, a previously identified ery cluster regulator, did not show any positive correlations with the expression of ery cluster, suggesting the existence of alternative regulation mechanisms of erythromycin synthesis in S. erythraea. Several potential regulators were then proposed by integration analysis of genomic and transcriptomic data. Conclusion This is a demonstration of the functional comparative genomics between an industrial S. erythraea strain and the wild-type strain. These findings help to understand the global regulation mechanisms of erythromycin biosynthesis in S. erythraea, providing useful clues for genetic and metabolic engineering in the future. PMID:23902230

  4. Comparative genomics reveals conservative evolution of the xylem transcriptome in vascular plants.

    PubMed

    Li, Xinguo; Wu, Harry X; Southerton, Simon G

    2010-06-21

    Wood is a valuable natural resource and a major carbon sink. Wood formation is an important developmental process in vascular plants which played a crucial role in plant evolution. Although genes involved in xylem formation have been investigated, the molecular mechanisms of xylem evolution are not well understood. We use comparative genomics to examine evolution of the xylem transcriptome to gain insights into xylem evolution. The xylem transcriptome is highly conserved in conifers, but considerably divergent in angiosperms. The functional domains of genes in the xylem transcriptome are moderately to highly conserved in vascular plants, suggesting the existence of a common ancestral xylem transcriptome. Compared to the total transcriptome derived from a range of tissues, the xylem transcriptome is relatively conserved in vascular plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known proteins and transcription factors are relatively more conserved in vascular plants. A total of 527 putative xylem orthologs were identified, which are unevenly distributed across the Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis revealed that evolution of the xylem transcriptome has paralleled plant evolution. We also identified 274 conifer-specific xylem unigenes, all of which are of unknown function. These xylem orthologs and conifer-specific unigenes are likely to have played a crucial role in xylem evolution. Conifers have highly conserved xylem transcriptomes, while angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a common ancestral xylem transcriptome. The xylem transcriptomes of vascular plants are more conserved than the total transcriptomes. Evolution of the xylem transcriptome has largely followed the trend of plant evolution.

  5. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    PubMed

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Allele Identification for Transcriptome-Based Population Genomics in the Invasive Plant Centaurea solstitialis

    PubMed Central

    Dlugosch, Katrina M.; Lai, Zhao; Bonin, Aurélie; Hierro, José; Rieseberg, Loren H.

    2013-01-01

    Transcriptome sequences are becoming more broadly available for multiple individuals of the same species, providing opportunities to derive population genomic information from these datasets. Using the 454 Life Science Genome Sequencer FLX and FLX-Titanium next-generation platforms, we generated 11−430 Mbp of sequence for normalized cDNA for 40 wild genotypes of the invasive plant Centaurea solstitialis, yellow starthistle, from across its worldwide distribution. We examined the impact of sequencing effort on transcriptome recovery and overlap among individuals. To do this, we developed two novel publicly available software pipelines: SnoWhite for read cleaning before assembly, and AllelePipe for clustering of loci and allele identification in assembled datasets with or without a reference genome. AllelePipe is designed specifically for cases in which read depth information is not appropriate or available to assist with disentangling closely related paralogs from allelic variation, as in transcriptome or previously assembled libraries. We find that modest applications of sequencing effort recover most of the novel sequences present in the transcriptome of this species, including single-copy loci and a representative distribution of functional groups. In contrast, the coverage of variable sites, observation of heterozygosity, and overlap among different libraries are all highly dependent on sequencing effort. Nevertheless, the information gained from overlapping regions was informative regarding coarse population structure and variation across our small number of population samples, providing the first genetic evidence in support of hypothesized invasion scenarios. PMID:23390612

  7. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

    PubMed

    Peterson, Elena S; McCue, Lee Ann; Schrimpe-Rutledge, Alexandra C; Jensen, Jeffrey L; Walker, Hyunjoo; Kobold, Markus A; Webb, Samantha R; Payne, Samuel H; Ansong, Charles; Adkins, Joshua N; Cannon, William R; Webb-Robertson, Bobbie-Jo M

    2012-04-05

    The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic genomes. Data is evaluated via

  8. A draft of the genome and four transcriptomes of a medicinal and pesticidal angiosperm Azadirachta indica

    PubMed Central

    2012-01-01

    Background The Azadirachta indica (neem) tree is a source of a wide number of natural products, including the potent biopesticide azadirachtin. In spite of its widespread applications in agriculture and medicine, the molecular aspects of the biosynthesis of neem terpenoids remain largely unexplored. The current report describes the draft genome and four transcriptomes of A. indica and attempts to contextualise the sequence information in terms of its molecular phylogeny, transcript expression and terpenoid biosynthesis pathways. A. indica is the first member of the family Meliaceae to be sequenced using next generation sequencing approach. Results The genome and transcriptomes of A. indica were sequenced using multiple sequencing platforms and libraries. The A. indica genome is AT-rich, bears few repetitive DNA elements and comprises about 20,000 genes. The molecular phylogenetic analyses grouped A. indica together with Citrus sinensis from the Rutaceae family validating its conventional taxonomic classification. Comparative transcript expression analysis showed either exclusive or enhanced expression of known genes involved in neem terpenoid biosynthesis pathways compared to other sequenced angiosperms. Genome and transcriptome analyses in A. indica led to the identification of repeat elements, nucleotide composition and expression profiles of genes in various organs. Conclusions This study on A. indica genome and transcriptomes will provide a model for characterization of metabolic pathways involved in synthesis of bioactive compounds, comparative evolutionary studies among various Meliaceae family members and help annotate their genomes. A better understanding of molecular pathways involved in the azadirachtin synthesis in A. indica will pave ways for bulk production of environment friendly biopesticides. PMID:22958331

  9. Global survey of genomic imprinting by transcriptome sequencing.

    PubMed

    Babak, Tomas; Deveale, Brian; Armour, Christopher; Raymond, Christopher; Cleary, Michele A; van der Kooy, Derek; Johnson, Jason M; Lim, Lee P

    2008-11-25

    Genomic imprinting restricts gene expression to a paternal or maternal allele. To date, approximately 90 imprinted transcripts have been identified in mouse, of which the majority were detected after intense interrogation of clusters of imprinted genes identified by phenotype-driven assays in mice with uniparental disomies [1]. Here we use selective priming and parallel sequencing to measure allelic bias in whole transcriptomes. By distinguishing parent-of-origin bias from strain-specific bias in embryos derived from a reciprocal cross of mice, we constructed a genome-wide map of imprinted transcription. This map was able to objectively locate over 80% of known imprinted loci and allowed the detection and confirmation of six novel imprinted genes. Even in the intensely studied embryonic day 9.5 developmental stage that we analyzed, more than half of all imprinted single-nucleotide polymorphisms did not overlap previously discovered imprinted transcripts; a large fraction of these represent novel noncoding RNAs within known imprinted loci. For example, a previously unnoticed, maternally expressed antisense transcript was mapped within the Grb10 locus. This study demonstrates the feasibility of using transcriptome sequencing for mapping of imprinted gene expression in physiologically normal animals. Such an approach will allow researchers to study imprinting without restricting themselves to individual loci or specific transcripts.

  10. Comparative genomics reveals conservative evolution of the xylem transcriptome in vascular plants

    PubMed Central

    2010-01-01

    Background Wood is a valuable natural resource and a major carbon sink. Wood formation is an important developmental process in vascular plants which played a crucial role in plant evolution. Although genes involved in xylem formation have been investigated, the molecular mechanisms of xylem evolution are not well understood. We use comparative genomics to examine evolution of the xylem transcriptome to gain insights into xylem evolution. Results The xylem transcriptome is highly conserved in conifers, but considerably divergent in angiosperms. The functional domains of genes in the xylem transcriptome are moderately to highly conserved in vascular plants, suggesting the existence of a common ancestral xylem transcriptome. Compared to the total transcriptome derived from a range of tissues, the xylem transcriptome is relatively conserved in vascular plants. Of the xylem transcriptome, cell wall genes, ancestral xylem genes, known proteins and transcription factors are relatively more conserved in vascular plants. A total of 527 putative xylem orthologs were identified, which are unevenly distributed across the Arabidopsis chromosomes with eight hot spots observed. Phylogenetic analysis revealed that evolution of the xylem transcriptome has paralleled plant evolution. We also identified 274 conifer-specific xylem unigenes, all of which are of unknown function. These xylem orthologs and conifer-specific unigenes are likely to have played a crucial role in xylem evolution. Conclusions Conifers have highly conserved xylem transcriptomes, while angiosperm xylem transcriptomes are relatively diversified. Vascular plants share a common ancestral xylem transcriptome. The xylem transcriptomes of vascular plants are more conserved than the total transcriptomes. Evolution of the xylem transcriptome has largely followed the trend of plant evolution. PMID:20565927

  11. Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.

    PubMed

    Taylor, Jeremy F; Whitacre, Lynsey K; Hoff, Jesse L; Tizioto, Polyana C; Kim, JaeWoo; Decker, Jared E; Schnabel, Robert D

    2016-08-17

    Decreasing sequencing costs and development of new protocols for characterizing global methylation, gene expression patterns and regulatory regions have stimulated the generation of large livestock datasets. Here, we discuss experiences in the analysis of whole-genome and transcriptome sequence data. We analyzed whole-genome sequence (WGS) data from 132 individuals from five canid species (Canis familiaris, C. latrans, C. dingo, C. aureus and C. lupus) and 61 breeds, three bison (Bison bison), 64 water buffalo (Bubalus bubalis) and 297 bovines from 17 breeds. By individual, data vary in extent of reference genome depth of coverage from 4.9X to 64.0X. We have also analyzed RNA-seq data for 580 samples representing 159 Bos taurus and Rattus norvegicus animals and 98 tissues. By aligning reads to a reference assembly and calling variants, we assessed effects of average depth of coverage on the actual coverage and on the number of called variants. We examined the identity of unmapped reads by assembling them and querying produced contigs against the non-redundant nucleic acids database. By imputing high-density single nucleotide polymorphism data on 4010 US registered Angus animals to WGS using Run4 of the 1000 Bull Genomes Project and assessing the accuracy of imputation, we identified misassembled reference sequence regions. We estimate that a 24X depth of coverage is required to achieve 99.5 % coverage of the reference assembly and identify 95 % of the variants within an individual's genome. Genomes sequenced to low average coverage (e.g., <10X) may fail to cover 10 % of the reference genome and identify <75 % of variants. About 10 % of genomic DNA or transcriptome sequence reads fail to align to the reference assembly. These reads include loci missing from the reference assembly and misassembled genes and interesting symbionts, commensal and pathogenic organisms. Assembly errors and a lack of annotation of functional elements significantly limit the utility of

  12. Genome wide transcriptome profiling reveals differential gene expression in secondary metabolite pathway of Cymbopogon winterianus.

    PubMed

    Devi, Kamalakshi; Mishra, Surajit K; Sahu, Jagajjit; Panda, Debashis; Modi, Mahendra K; Sen, Priyabrata

    2016-02-15

    Advances in transcriptome sequencing provide fast, cost-effective and reliable approach to generate large expression datasets especially suitable for non-model species to identify putative genes, key pathway and regulatory mechanism. Citronella (Cymbopogon winterianus) is an aromatic medicinal grass used for anti-tumoral, antibacterial, anti-fungal, antiviral, detoxifying and natural insect repellent properties. Despite of having number of utilities, the genes involved in terpenes biosynthetic pathway is not yet clearly elucidated. The present study is a pioneering attempt to generate an exhaustive molecular information of secondary metabolite pathway and to increase genomic resources in Citronella. Using high-throughput RNA-Seq technology, root and leaf transcriptome was analysed at an unprecedented depth (11.7 Gb). Targeted searches identified majority of the genes associated with metabolic pathway and other natural product pathway viz. antibiotics synthesis along with many novel genes. Terpenoid biosynthesis genes comparative expression results were validated for 15 unigenes by RT-PCR and qRT-PCR. Thus the coverage of these transcriptome is comprehensive enough to discover all known genes of major metabolic pathways. This transcriptome dataset can serve as important public information for gene expression, genomics and function genomics studies in Citronella and shall act as a benchmark for future improvement of the crop.

  13. Intra-isolate genome variation in arbuscular mycorrhizal fungi persists in the transcriptome.

    PubMed

    Boon, E; Zimmerman, E; Lang, B F; Hijri, M

    2010-07-01

    Arbuscular mycorrhizal fungi (AMF) are heterokaryotes with an unusual genetic makeup. Substantial genetic variation occurs among nuclei within a single mycelium or isolate. AMF reproduce through spores that contain varying fractions of this heterogeneous population of nuclei. It is not clear whether this genetic variation on the genome level actually contributes to the AMF phenotype. To investigate the extent to which polymorphisms in nuclear genes are transcribed, we analysed the intra-isolate genomic and cDNA sequence variation of two genes, the large subunit ribosomal RNA (LSU rDNA) of Glomus sp. DAOM-197198 (previously known as G. intraradices) and the POL1-like sequence (PLS) of Glomus etunicatum. For both genes, we find high sequence variation at the genome and transcriptome level. Reconstruction of LSU rDNA secondary structure shows that all variants are functional. Patterns of PLS sequence polymorphism indicate that there is one functional gene copy, PLS2, which is preferentially transcribed, and one gene copy, PLS1, which is a pseudogene. This is the first study that investigates AMF intra-isolate variation at the transcriptome level. In conclusion, it is possible that, in AMF, multiple nuclear genomes contribute to a single phenotype.

  14. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

    PubMed

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-06-15

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.

  15. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

    PubMed Central

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-01-01

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961

  16. Genome-wide transcriptome and expression profile analysis of Phalaenopsis during explant browning.

    PubMed

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further functional studies to prevent explant browning.

  17. Genome-Wide Transcriptome and Expression Profile Analysis of Phalaenopsis during Explant Browning

    PubMed Central

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Background Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. Methodology/Principal Findings We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Conclusions/Significance Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further

  18. VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data

    PubMed Central

    2012-01-01

    Background The procedural aspects of genome sequencing and assembly have become relatively inexpensive, yet the full, accurate structural annotation of these genomes remains a challenge. Next-generation sequencing transcriptomics (RNA-Seq), global microarrays, and tandem mass spectrometry (MS/MS)-based proteomics have demonstrated immense value to genome curators as individual sources of information, however, integrating these data types to validate and improve structural annotation remains a major challenge. Current visual and statistical analytic tools are focused on a single data type, or existing software tools are retrofitted to analyze new data forms. We present Visual Exploration and Statistics to Promote Annotation (VESPA) is a new interactive visual analysis software tool focused on assisting scientists with the annotation of prokaryotic genomes though the integration of proteomics and transcriptomics data with current genome location coordinates. Results VESPA is a desktop Java™ application that integrates high-throughput proteomics data (peptide-centric) and transcriptomics (probe or RNA-Seq) data into a genomic context, all of which can be visualized at three levels of genomic resolution. Data is interrogated via searches linked to the genome visualizations to find regions with high likelihood of mis-annotation. Search results are linked to exports for further validation outside of VESPA or potential coding-regions can be analyzed concurrently with the software through interaction with BLAST. VESPA is demonstrated on two use cases (Yersinia pestis Pestoides F and Synechococcus sp. PCC 7002) to demonstrate the rapid manner in which mis-annotations can be found and explored in VESPA using either proteomics data alone, or in combination with transcriptomic data. Conclusions VESPA is an interactive visual analytics tool that integrates high-throughput data into a genomic context to facilitate the discovery of structural mis-annotations in prokaryotic

  19. Comparative Genomic and Transcriptomic Characterization of the Toxigenic Marine Dinoflagellate Alexandrium ostenfeldii

    PubMed Central

    Jaeckisch, Nina; Yang, Ines; Wohlrab, Sylke; Glöckner, Gernot; Kroymann, Juergen; Vogel, Heiko; Cembella, Allan; John, Uwe

    2011-01-01

    Many dinoflagellate species are notorious for the toxins they produce and ecological and human health consequences associated with harmful algal blooms (HABs). Dinoflagellates are particularly refractory to genomic analysis due to the enormous genome size, lack of knowledge about their DNA composition and structure, and peculiarities of gene regulation, such as spliced leader (SL) trans-splicing and mRNA transposition mechanisms. Alexandrium ostenfeldii is known to produce macrocyclic imine toxins, described as spirolides. We characterized the genome of A. ostenfeldii using a combination of transcriptomic data and random genomic clones for comparison with other dinoflagellates, particularly Alexandrium species. Examination of SL sequences revealed similar features as in other dinoflagellates, including Alexandrium species. SL sequences in decay indicate frequent retro-transposition of mRNA species. This probably contributes to overall genome complexity by generating additional gene copies. Sequencing of several thousand fosmid and bacterial artificial chromosome (BAC) ends yielded a wealth of simple repeats and tandemly repeated longer sequence stretches which we estimated to comprise more than half of the whole genome. Surprisingly, the repeats comprise a very limited set of 79–97 bp sequences; in part the genome is thus a relatively uniform sequence space interrupted by coding sequences. Our genomic sequence survey (GSS) represents the largest genomic data set of a dinoflagellate to date. Alexandrium ostenfeldii is a typical dinoflagellate with respect to its transcriptome and mRNA transposition but demonstrates Alexandrium-like stop codon usage. The large portion of repetitive sequences and the organization within the genome is in agreement with several other studies on dinoflagellates using different approaches. It remains to be determined whether this unusual composition is directly correlated to the exceptionally genome organization of dinoflagellates with

  20. RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    PubMed

    Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-10-01

    Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.

  1. The duck genome and transcriptome provide insight into an avian influenza virus reservoir species

    PubMed Central

    Chen, Hualan; Zhang, Yong; Qian, Wubin; Kim, Heebal; Gan, Shangquan; Zhao, Yiqiang; Li, Jianwen; Yi, Kang; Feng, Huapeng; Zhu, Pengyang; Li, Bo; Liu, Qiuyue; Fairley, Suan; Magor, Katharine E; Du, Zhenlin; Hu, Xiaoxiang; Goodman, Laurie; Tafer, Hakim; Vignal, Alain; Lee, Taeheon; Kim, Kyu-Won; Sheng, Zheya; An, Yang; Searle, Steve; Herrero, Javier; Groenen, Martien A M; Crooijmans, Richard P M A; Faraut, Thomas; Cai, Qingle; Webster, Robert G; Aldridge, Jerry R; Warren, Wesley C; Bartschat, Sebastian; Kehr, Stephanie; Marz, Manja; Stadler, Peter F; Smith, Jacqueline; Kraus, Robert H S; Zhao, Yaofeng; Ren, Liming; Fei, Jing; Morisson, Mireille; Kaiser, Pete; Griffin, Darren K; Rao, Man; Pitel, Frederique; Wang, Jun; Li, Ning

    2014-01-01

    The duck (Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We present the duck genome sequence and perform deep transcriptome analyses to investigate immune-related genes. Our data indicate that the duck possesses a contractive immune gene repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage-specific duplications. We identify genes that are responsive to influenza A viruses using the lung transcriptomes of control ducks and ones that were infected with either a highly pathogenic (A/duck/Hubei/49/05) or a weakly pathogenic (A/goose/Hubei/65/05) H5N1 virus. Further, we show how the duck’s defense mechanisms against influenza infection have been optimized through the diversification of its β-defensin and butyrophilin-like repertoires. These analyses, in combination with the genomic and transcriptomic data, provide a resource for characterizing the interaction between host and influenza viruses. PMID:23749191

  2. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas

    PubMed Central

    Hou, Yu; Guo, Huahu; Cao, Chen; Li, Xianlong; Hu, Boqiang; Zhu, Ping; Wu, Xinglong; Wen, Lu; Tang, Fuchou; Huang, Yanyi; Peng, Jirun

    2016-01-01

    Single-cell genome, DNA methylome, and transcriptome sequencing methods have been separately developed. However, to accurately analyze the mechanism by which transcriptome, genome and DNA methylome regulate each other, these omic methods need to be performed in the same single cell. Here we demonstrate a single-cell triple omics sequencing technique, scTrio-seq, that can be used to simultaneously analyze the genomic copy-number variations (CNVs), DNA methylome, and transcriptome of an individual mammalian cell. We show that large-scale CNVs cause proportional changes in RNA expression of genes within the gained or lost genomic regions, whereas these CNVs generally do not affect DNA methylation in these regions. Furthermore, we applied scTrio-seq to 25 single cancer cells derived from a human hepatocellular carcinoma tissue sample. We identified two subpopulations within these cells based on CNVs, DNA methylome, or transcriptome of individual cells. Our work offers a new avenue of dissecting the complex contribution of genomic and epigenomic heterogeneities to the transcriptomic heterogeneity within a population of cells. PMID:26902283

  3. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics

    PubMed Central

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1–120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research. PMID:28018419

  4. Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production.

    PubMed

    Roth, Melissa S; Cokus, Shawn J; Gallaher, Sean D; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A; Merchant, Sabeeha S; Pellegrini, Matteo; Niyogi, Krishna K

    2017-05-23

    Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis , because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase ( BKT ), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.

  5. Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

    DOE PAGES

    Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; ...

    2017-05-08

    Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less

  6. Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.

    Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less

  7. Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

    PubMed Central

    Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A.; Merchant, Sabeeha S.; Pellegrini, Matteo

    2017-01-01

    Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production. PMID:28484037

  8. The genome and developmental transcriptome of the strongylid nematode Haemonchus contortus

    PubMed Central

    2013-01-01

    Background The barber's pole worm, Haemonchus contortus, is one of the most economically important parasites of small ruminants worldwide. Although this parasite can be controlled using anthelmintic drugs, resistance against most drugs in common use has become a widespread problem. We provide a draft of the genome and the transcriptomes of all key developmental stages of H. contortus to support biological and biotechnological research areas of this and related parasites. Results The draft genome of H. contortus is 320 Mb in size and encodes 23,610 protein-coding genes. On a fundamental level, we elucidate transcriptional alterations taking place throughout the life cycle, characterize the parasite's gene silencing machinery, and explore molecules involved in development, reproduction, host-parasite interactions, immunity, and disease. The secretome of H. contortus is particularly rich in peptidases linked to blood-feeding activity and interactions with host tissues, and a diverse array of molecules is involved in complex immune responses. On an applied level, we predict drug targets and identify vaccine molecules. Conclusions The draft genome and developmental transcriptome of H. contortus provide a major resource to the scientific community for a wide range of genomic, genetic, proteomic, metabolomic, evolutionary, biological, ecological, and epidemiological investigations, and a solid foundation for biotechnological outcomes, including new anthelmintics, vaccines and diagnostic tests. This first draft genome of any strongylid nematode paves the way for a rapid acceleration in our understanding of a wide range of socioeconomically important parasites of one of the largest nematode orders. PMID:23985341

  9. Microbial genomics, transcriptomics and proteomics: new discoveries in decomposition research using complementary methods.

    PubMed

    Baldrian, Petr; López-Mondéjar, Rubén

    2014-02-01

    Molecular methods for the analysis of biomolecules have undergone rapid technological development in the last decade. The advent of next-generation sequencing methods and improvements in instrumental resolution enabled the analysis of complex transcriptome, proteome and metabolome data, as well as a detailed annotation of microbial genomes. The mechanisms of decomposition by model fungi have been described in unprecedented detail by the combination of genome sequencing, transcriptomics and proteomics. The increasing number of available genomes for fungi and bacteria shows that the genetic potential for decomposition of organic matter is widespread among taxonomically diverse microbial taxa, while expression studies document the importance of the regulation of expression in decomposition efficiency. Importantly, high-throughput methods of nucleic acid analysis used for the analysis of metagenomes and metatranscriptomes indicate the high diversity of decomposer communities in natural habitats and their taxonomic composition. Today, the metaproteomics of natural habitats is of interest. In combination with advanced analytical techniques to explore the products of decomposition and the accumulation of information on the genomes of environmentally relevant microorganisms, advanced methods in microbial ecophysiology should increase our understanding of the complex processes of organic matter transformation.

  10. Reliable transformation system for Microbotryum lychnidis-dioicae informed by genome and transcriptome project.

    PubMed

    Toh, Su San; Treves, David S; Barati, Michelle T; Perlin, Michael H

    2016-10-01

    Microbotryum lychnidis-dioicae is a member of a species complex infecting host plants in the Caryophyllaceae. It is used as a model system in many areas of research, but attempts to make this organism tractable for reverse genetic approaches have not been fruitful. Here, we exploited the recently obtained genome sequence and transcriptome analysis to inform our design of constructs for use in Agrobacterium-mediated transformation techniques currently available for other fungi. Reproducible transformation was demonstrated at the genomic, transcriptional and functional levels. Moreover, these initial proof-of-principle experiments provide evidence that supports the findings from initial global transcriptome analysis regarding expression from the respective promoters under different growth conditions of the fungus. The technique thus provides for the first time the ability to stably introduce transgenes and over-express target M. lychnidis-dioicae genes.

  11. The past, present, and future of Leishmania genomics and transcriptomics

    PubMed Central

    Cantacessi, Cinzia; Dantas-Torres, Filipe; Nolan, Matthew J.; Otranto, Domenico

    2015-01-01

    It has been nearly 10 years since the completion of the first entire genome sequence of a Leishmania parasite. Genomic and transcriptomic analyses have advanced our understanding of the biology of Leishmania, and shed new light on the complex interactions occurring within the parasite–host–vector triangle. Here, we review these advances and examine potential avenues for translation of these discoveries into treatment and control programs. In addition, we argue for a strong need to explore how disease in dogs relates to that in humans, and how an improved understanding in line with the ‘One Health’ concept may open new avenues for the control of these devastating diseases. PMID:25638444

  12. Transcriptome of interstitial cells of Cajal reveals unique and selective gene signatures

    PubMed Central

    Park, Paul J.; Fuchs, Robert; Wei, Lai; Jorgensen, Brian G.; Redelman, Doug; Ward, Sean M.; Sanders, Kenton M.

    2017-01-01

    Transcriptome-scale data can reveal essential clues into understanding the underlying molecular mechanisms behind specific cellular functions and biological processes. Transcriptomics is a continually growing field of research utilized in biomarker discovery. The transcriptomic profile of interstitial cells of Cajal (ICC), which serve as slow-wave electrical pacemakers for gastrointestinal (GI) smooth muscle, has yet to be uncovered. Using copGFP-labeled ICC mice and flow cytometry, we isolated ICC populations from the murine small intestine and colon and obtained their transcriptomes. In analyzing the transcriptome, we identified a unique set of ICC-restricted markers including transcription factors, epigenetic enzymes/regulators, growth factors, receptors, protein kinases/phosphatases, and ion channels/transporters. This analysis provides new and unique insights into the cellular and biological functions of ICC in GI physiology. Additionally, we constructed an interactive ICC genome browser (http://med.unr.edu/physio/transcriptome) based on the UCSC genome database. To our knowledge, this is the first online resource that provides a comprehensive library of all known genetic transcripts expressed in primary ICC. Our genome browser offers a new perspective into the alternative expression of genes in ICC and provides a valuable reference for future functional studies. PMID:28426719

  13. Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics, Transcriptomics and Proteogenomics

    PubMed Central

    Dodhia, Kejal; Stoll, Thomas; Hastie, Marcus; Furuki, Eiko; Ellwood, Simon R.; Williams, Angela H.; Tan, Yew-Foon; Testa, Alison C.; Gorman, Jeffrey J.; Oliver, Richard P.

    2016-01-01

    Parastagonospora nodorum, the causal agent of Septoria nodorum blotch (SNB), is an economically important pathogen of wheat (Triticum spp.), and a model for the study of necrotrophic pathology and genome evolution. The reference P. nodorum strain SN15 was the first Dothideomycete with a published genome sequence, and has been used as the basis for comparison within and between species. Here we present an updated reference genome assembly with corrections of SNP and indel errors in the underlying genome assembly from deep resequencing data as well as extensive manual annotation of gene models using transcriptomic and proteomic sources of evidence (https://github.com/robsyme/Parastagonospora_nodorum_SN15). The updated assembly and annotation includes 8,366 genes with modified protein sequence and 866 new genes. This study shows the benefits of using a wide variety of experimental methods allied to expert curation to generate a reliable set of gene models. PMID:26840125

  14. Trade-off between Transcriptome Plasticity and Genome Evolution in Cephalopods.

    PubMed

    Liscovitch-Brauer, Noa; Alon, Shahar; Porath, Hagit T; Elstein, Boaz; Unger, Ron; Ziv, Tamar; Admon, Arie; Levanon, Erez Y; Rosenthal, Joshua J C; Eisenberg, Eli

    2017-04-06

    RNA editing, a post-transcriptional process, allows the diversification of proteomes beyond the genomic blueprint; however it is infrequently used among animals for this purpose. Recent reports suggesting increased levels of RNA editing in squids thus raise the question of the nature and effects of these events. We here show that RNA editing is particularly common in behaviorally sophisticated coleoid cephalopods, with tens of thousands of evolutionarily conserved sites. Editing is enriched in the nervous system, affecting molecules pertinent for excitability and neuronal morphology. The genomic sequence flanking editing sites is highly conserved, suggesting that the process confers a selective advantage. Due to the large number of sites, the surrounding conservation greatly reduces the number of mutations and genomic polymorphisms in protein-coding regions. This trade-off between genome evolution and transcriptome plasticity highlights the importance of RNA recoding as a strategy for diversifying proteins, particularly those associated with neural function. PAPERCLIP. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Novel mouse model recapitulates genome and transcriptome alterations in human colorectal carcinomas.

    PubMed

    McNeil, Nicole E; Padilla-Nash, Hesed M; Buishand, Floryne O; Hue, Yue; Ried, Thomas

    2017-03-01

    Human colorectal carcinomas are defined by a nonrandom distribution of genomic imbalances that are characteristic for this disease. Often, these imbalances affect entire chromosomes. Understanding the role of these aneuploidies for carcinogenesis is of utmost importance. Currently, established transgenic mice do not recapitulate the pathognonomic genome aberration profile of human colorectal carcinomas. We have developed a novel model based on the spontaneous transformation of murine colon epithelial cells. During this process, cells progress through stages of pre-immortalization, immortalization and, finally, transformation, and result in tumors when injected into immunocompromised mice. We analyzed our model for genome and transcriptome alterations using ArrayCGH, spectral karyotyping (SKY), and array based gene expression profiling. ArrayCGH revealed a recurrent pattern of genomic imbalances. These results were confirmed by SKY. Comparing these imbalances with orthologous maps of human chromosomes revealed a remarkable overlap. We observed focal deletions of the tumor suppressor genes Trp53 and Cdkn2a/p16. High-level focal genomic amplification included the locus harboring the oncogene Mdm2, which was confirmed by FISH in the form of double minute chromosomes. Array-based global gene expression revealed distinct differences between the sequential steps of spontaneous transformation. Gene expression changes showed significant similarities with human colorectal carcinomas. Pathways most prominently affected included genes involved in chromosomal instability and in epithelial to mesenchymal transition. Our novel mouse model therefore recapitulates the most prominent genome and transcriptome alterations in human colorectal cancer, and might serve as a valuable tool for understanding the dynamic process of tumorigenesis, and for preclinical drug testing. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  16. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns.

    PubMed

    Grusz, Amanda L; Rothfels, Carl J; Schuettpelz, Eric

    2016-08-30

    Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.

  17. Role of genomics and transcriptomics in selection of reintroduction source populations.

    PubMed

    He, Xiaoping; Johansson, Mattias L; Heath, Daniel D

    2016-10-01

    The use and importance of reintroduction as a conservation tool to return a species to its historical range from which it has been extirpated will increase as climate change and human development accelerate habitat loss and population extinctions. Although the number of reintroduction attempts has increased rapidly over the past 2 decades, the success rate is generally low. As a result of population differences in fitness-related traits and divergent responses to environmental stresses, population performance upon reintroduction is highly variable, and it is generally agreed that selecting an appropriate source population is a critical component of a successful reintroduction. Conservation genomics is an emerging field that addresses long-standing challenges in conservation, and the potential for using novel molecular genetic approaches to inform and improve conservation efforts is high. Because the successful establishment and persistence of reintroduced populations is highly dependent on the functional genetic variation and environmental stress tolerance of the source population, we propose the application of conservation genomics and transcriptomics to guide reintroduction practices. Specifically, we propose using genome-wide functional loci to estimate genetic variation of source populations. This estimate can then be used to predict the potential for adaptation. We also propose using transcriptional profiling to measure the expression response of fitness-related genes to environmental stresses as a proxy for acclimation (tolerance) capacity. Appropriate application of conservation genomics and transcriptomics has the potential to dramatically enhance reintroduction success in a time of rapidly declining biodiversity and accelerating environmental change. © 2016 Society for Conservation Biology.

  18. Integrative approaches for large-scale transcriptome-wide association studies

    PubMed Central

    Gusev, Alexander; Ko, Arthur; Shi, Huwenbo; Bhatia, Gaurav; Chung, Wonil; Penninx, Brenda W J H; Jansen, Rick; de Geus, Eco JC; Boomsma, Dorret I; Wright, Fred A; Sullivan, Patrick F; Nikkola, Elina; Alvarez, Marcus; Civelek, Mete; Lusis, Aldons J.; Lehtimäki, Terho; Raitoharju, Emma; Kähönen, Mika; Seppälä, Ilkka; Raitakari, Olli T.; Kuusisto, Johanna; Laakso, Markku; Price, Alkes L.; Pajukanta, Päivi; Pasaniuc, Bogdan

    2016-01-01

    Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance levels of one or multiple proteins. Here, we introduce a powerful strategy that integrates gene expression measurements with summary association statistics from large-scale genome-wide association studies (GWAS) to identify genes whose cis-regulated expression is associated to complex traits. We leverage expression imputation to perform a transcriptome wide association scan (TWAS) to identify significant expression-trait associations. We applied our approaches to expression data from blood and adipose tissue measured in ~3,000 individuals overall. We imputed gene expression into GWAS data from over 900,000 phenotype measurements to identify 69 novel genes significantly associated to obesity-related traits (BMI, lipids, and height). Many of the novel genes are associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits. PMID:26854917

  19. Transcriptomics and molecular evolutionary rate analysis of the bladderwort (Utricularia), a carnivorous plant with a minimal genome

    PubMed Central

    2011-01-01

    Background The carnivorous plant Utricularia gibba (bladderwort) is remarkable in having a minute genome, which at ca. 80 megabases is approximately half that of Arabidopsis. Bladderworts show an incredible diversity of forms surrounding a defined theme: tiny, bladder-like suction traps on terrestrial, epiphytic, or aquatic plants with a diversity of unusual vegetative forms. Utricularia plants, which are rootless, are also anomalous in physiological features (respiration and carbon distribution), and highly enhanced molecular evolutionary rates in chloroplast, mitochondrial and nuclear ribosomal sequences. Despite great interest in the genus, no genomic resources exist for Utricularia, and the substitution rate increase has received limited study. Results Here we describe the sequencing and analysis of the Utricularia gibba transcriptome. Three different organs were surveyed, the traps, the vegetative shoot bodies, and the inflorescence stems. We also examined the bladderwort transcriptome under diverse stress conditions. We detail aspects of functional classification, tissue similarity, nitrogen and phosphorus metabolism, respiration, DNA repair, and detoxification of reactive oxygen species (ROS). Long contigs of plastid and mitochondrial genomes, as well as sequences for 100 individual nuclear genes, were compared with those of other plants to better establish information on molecular evolutionary rates. Conclusion The Utricularia transcriptome provides a detailed genomic window into processes occurring in a carnivorous plant. It contains a deep representation of the complex metabolic pathways that characterize a putative minimal plant genome, permitting its use as a source of genomic information to explore the structural, functional, and evolutionary diversity of the genus. Vegetative shoots and traps are the most similar organs by functional classification of their transcriptome, the traps expressing hydrolytic enzymes for prey digestion that were previously

  20. Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef).

    PubMed

    Cannarozzi, Gina; Plaza-Wüthrich, Sonia; Esfeld, Korinna; Larti, Stéphanie; Wilson, Yi Song; Girma, Dejene; de Castro, Edouard; Chanyalew, Solomon; Blösch, Regula; Farinelli, Laurent; Lyons, Eric; Schneider, Michel; Falquet, Laurent; Kuhlemeier, Cris; Assefa, Kebebew; Tadele, Zerihun

    2014-07-09

    Tef (Eragrostis tef), an indigenous cereal critical to food security in the Horn of Africa, is rich in minerals and protein, resistant to many biotic and abiotic stresses and safe for diabetics as well as sufferers of immune reactions to wheat gluten. We present the genome of tef, the first species in the grass subfamily Chloridoideae and the first allotetraploid assembled de novo. We sequenced the tef genome for marker-assisted breeding, to shed light on the molecular mechanisms conferring tef's desirable nutritional and agronomic properties, and to make its genome publicly available as a community resource. The draft genome contains 672 Mbp representing 87% of the genome size estimated from flow cytometry. We also sequenced two transcriptomes, one from a normalized RNA library and another from unnormalized RNASeq data. The normalized RNA library revealed around 38000 transcripts that were then annotated by the SwissProt group. The CoGe comparative genomics platform was used to compare the tef genome to other genomes, notably sorghum. Scaffolds comprising approximately half of the genome size were ordered by syntenic alignment to sorghum producing tef pseudo-chromosomes, which were sorted into A and B genomes as well as compared to the genetic map of tef. The draft genome was used to identify novel SSR markers, investigate target genes for abiotic stress resistance studies, and understand the evolution of the prolamin family of proteins that are responsible for the immune response to gluten. It is highly plausible that breeding targets previously identified in other cereal crops will also be valuable breeding targets in tef. The draft genome and transcriptome will be of great use for identifying these targets for genetic improvement of this orphan crop that is vital for feeding 50 million people in the Horn of Africa.

  1. Genome-Scale Transcriptomic Insights into Early-Stage Fruit Development in Woodland Strawberry Fragaria vesca[C][W

    PubMed Central

    Kang, Chunying; Darwish, Omar; Geretz, Aviva; Shahan, Rachel; Alkharouf, Nadim; Liu, Zhongchi

    2013-01-01

    Fragaria vesca, a diploid woodland strawberry with a small and sequenced genome, is an excellent model for studying fruit development. The strawberry fruit is unique in that the edible flesh is actually enlarged receptacle tissue. The true fruit are the numerous dry achenes dotting the receptacle’s surface. Auxin produced from the achene is essential for the receptacle fruit set, a paradigm for studying crosstalk between hormone signaling and development. To investigate the molecular mechanism underlying strawberry fruit set, next-generation sequencing was employed to profile early-stage fruit development with five fruit tissue types and five developmental stages from floral anthesis to enlarged fruits. This two-dimensional data set provides a systems-level view of molecular events with precise spatial and temporal resolution. The data suggest that the endosperm and seed coat may play a more prominent role than the embryo in auxin and gibberellin biosynthesis for fruit set. A model is proposed to illustrate how hormonal signals produced in the endosperm and seed coat coordinate seed, ovary wall, and receptacle fruit development. The comprehensive fruit transcriptome data set provides a wealth of genomic resources for the strawberry and Rosaceae communities as well as unprecedented molecular insight into fruit set and early stage fruit development. PMID:23898027

  2. Separation and parallel sequencing of the genomes and transcriptomes of single cells using G&T-seq.

    PubMed

    Macaulay, Iain C; Teng, Mabel J; Haerty, Wilfried; Kumar, Parveen; Ponting, Chris P; Voet, Thierry

    2016-11-01

    Parallel sequencing of a single cell's genome and transcriptome provides a powerful tool for dissecting genetic variation and its relationship with gene expression. Here we present a detailed protocol for G&T-seq, a method for separation and parallel sequencing of genomic DNA and full-length polyA(+) mRNA from single cells. We provide step-by-step instructions for the isolation and lysis of single cells; the physical separation of polyA(+) mRNA from genomic DNA using a modified oligo-dT bead capture and the respective whole-transcriptome and whole-genome amplifications; and library preparation and sequence analyses of these amplification products. The method allows the detection of thousands of transcripts in parallel with the genetic variants captured by the DNA-seq data from the same single cell. G&T-seq differs from other currently available methods for parallel DNA and RNA sequencing from single cells, as it involves physical separation of the DNA and RNA and does not require bespoke microfluidics platforms. The process can be implemented manually or through automation. When performed manually, paired genome and transcriptome sequencing libraries from eight single cells can be produced in ∼3 d by researchers experienced in molecular laboratory work. For users with experience in the programming and operation of liquid-handling robots, paired DNA and RNA libraries from 96 single cells can be produced in the same time frame. Sequence analysis and integration of single-cell G&T-seq DNA and RNA data requires a high level of bioinformatics expertise and familiarity with a wide range of informatics tools.

  3. Novel genomic resources for a climate change sensitive mammal: characterization of the American pika transcriptome.

    PubMed

    Lemay, Matthew A; Henry, Philippe; Lamb, Clayton T; Robson, Kelsey M; Russello, Michael A

    2013-05-10

    with the hemoglobin alpha chain from the plateau pika, a species restricted to high elevation steppes in Asia. Elevation-specific contigs may represent candidate regions subject to differential levels of gene expression along this elevation gradient. To our knowledge, this is the first broad-scale, transcriptome-level study conducted within the Ochotonidae, providing novel genomic resources for studying pika ecology, behaviour and population history.

  4. Deep analysis of wild Vitis flower transcriptome reveals unexplored genome regions associated with sex specification.

    PubMed

    Ramos, Miguel Jesus Nunes; Coito, João Lucas; Fino, Joana; Cunha, Jorge; Silva, Helena; de Almeida, Patrícia Gomes; Costa, Maria Manuela Ribeiro; Amâncio, Sara; Paulo, Octávio S; Rocheta, Margarida

    2017-01-01

    RNA-seq of Vitis during early stages of bud development, in male, female and hermaphrodite flowers, identified new loci outside of annotated gene models, suggesting their involvement in sex establishment. The molecular mechanisms responsible for flower sex specification remain unclear for most plant species. In the case of V. vinifera ssp. vinifera, it is not fully understood what determines hermaphroditism in the domesticated subspecies and male or female flowers in wild dioecious relatives (Vitis vinifera ssp. sylvestris). Here, we describe a de novo assembly of the transcriptome of three flower developmental stages from the three Vitis vinifera flower types. The validation of de novo assembly showed a correlation of 0.825. The main goals of this work were the identification of V. v. sylvestris exclusive transcripts and the characterization of differential gene expression during flower development. RNA from several flower developmental stages was used previously to generate Illumina sequence reads. Through a sequential de novo assembly strategy one comprehensive transcriptome comprising 95,516 non-redundant transcripts was assembled. From this dataset 81,064 transcripts were annotated to V. v. vinifera reference transcriptome and 11,084 were annotated against V. v. vinifera reference genome. Moreover, we found 3368 transcripts that could not be mapped to Vitis reference genome. From all the non-redundant transcripts that were assembled, bioinformatics analysis identified 133 specific of V. v. sylvestris and 516 transcripts differentially expressed among the three flower types. The detection of transcription from areas of the genome not currently annotated suggests active transcription of previously unannotated genomic loci during early stages of bud development.

  5. VESPA: Software to Facilitate Genomic Annotation of Prokaryotic Organisms Through Integration of Proteomic and Transcriptomic Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peterson, Elena S.; McCue, Lee Ann; Rutledge, Alexandra C.

    2012-04-25

    Visual Exploration and Statistics to Promote Annotation (VESPA) is an interactive visual analysis software tool that facilitates the discovery of structural mis-annotations in prokaryotic genomes. VESPA integrates high-throughput peptide-centric proteomics data and oligo-centric or RNA-Seq transcriptomics data into a genomic context. The data may be interrogated via visual analysis across multiple levels of genomic resolution, linked searches, exports and interaction with BLAST to rapidly identify location of interest within the genome and evaluate potential mis-annotations.

  6. Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hirsch, Candice N.; Hirsch, Cory D.; Brohammer, Alex B.

    Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison ofmore » these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.« less

  7. Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize

    DOE PAGES

    Hirsch, Candice N.; Hirsch, Cory D.; Brohammer, Alex B.; ...

    2016-11-01

    Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison ofmore » these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools.« less

  8. Draft Assembly of Elite Inbred Line PH207 Provides Insights into Genomic and Transcriptome Diversity in Maize[OPEN

    PubMed Central

    Soifer, Ilya; Barad, Omer; Shem-Tov, Doron; Baruch, Kobi; Lu, Fei; Hernandez, Alvaro G.; Wright, Chris L.; Koehler, Klaus; Buell, C. Robin; de Leon, Natalia

    2016-01-01

    Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft genome assembly of the inbred line PH207 to complement and compare with the existing B73 reference sequence. B73 is a founder of the Stiff Stalk germplasm pool, while PH207 is a founder of Iodent germplasm, both of which have contributed substantially to the production of temperate commercial maize and are combined to make heterotic hybrids. Comparison of these two assemblies revealed over 2500 genes present in only one of the two genotypes and 136 gene families that have undergone extensive expansion or contraction. Transcriptome profiling revealed extensive expression variation, with as many as 10,564 differentially expressed transcripts and 7128 transcripts expressed in only one of the two genotypes in a single tissue. Genotype-specific genes were more likely to have tissue/condition-specific expression and lower transcript abundance. The availability of a high-quality genome assembly for the elite maize inbred PH207 expands our knowledge of the breadth of natural genome and transcriptome variation in elite maize inbred lines across heterotic pools. PMID:27803309

  9. The genome and transcriptome of perennial ryegrass mitochondria

    PubMed Central

    2013-01-01

    Background Perennial ryegrass (Lolium perenne L.) is one of the most important forage and turf grass species of temperate regions worldwide. Its mitochondrial genome is inherited maternally and contains genes that can influence traits of agricultural importance. Moreover, the DNA sequence of mitochondrial genomes has been established and compared for a large number of species in order to characterize evolutionary relationships. Therefore, it is crucial to understand the organization of the mitochondrial genome and how it varies between and within species. Here, we report the first de novo assembly and annotation of the complete mitochondrial genome from perennial ryegrass. Results Intact mitochondria from perennial ryegrass leaves were isolated and used for mtDNA extraction. The mitochondrial genome was sequenced to a 167-fold coverage using the Roche 454 GS-FLX Titanium platform, and assembled into a circular master molecule of 678,580 bp. A total of 34 proteins, 14 tRNAs and 3 rRNAs are encoded by the mitochondrial genome, giving a total gene space of 48,723 bp (7.2%). Moreover, we identified 149 open reading frames larger than 300 bp and covering 67,410 bp (9.93%), 250 SSRs, 29 tandem repeats, 5 pairs of large repeats, and 96 pairs of short inverted repeats. The genes encoding subunits of the respiratory complexes – nad1 to nad9, cob, cox1 to cox3 and atp1 to atp9 – all showed high expression levels both in absolute numbers and after normalization. Conclusions The circular master molecule of the mitochondrial genome from perennial ryegrass presented here constitutes an important tool for future attempts to compare mitochondrial genomes within and between grass species. Our results also demonstrate that mitochondria of perennial ryegrass contain genes crucial for energy production that are well conserved in the mitochondrial genome of monocotyledonous species. The expression analysis gave us first insights into the transcriptome of these mitochondrial

  10. Transcriptome-wide investigation of genomic imprinting in chicken

    PubMed Central

    Frésard, Laure; Leroux, Sophie; Servin, Bertrand; Gourichon, David; Dehais, Patrice; Cristobal, Magali San; Marsaud, Nathalie; Vignoles, Florence; Bed'hom, Bertrand; Coville, Jean-Luc; Hormozdiari, Farhad; Beaumont, Catherine; Zerjal, Tatiana; Vignal, Alain; Morisson, Mireille; Lagarrigue, Sandrine; Pitel, Frédérique

    2014-01-01

    Genomic imprinting is an epigenetic mechanism by which alleles of some specific genes are expressed in a parent-of-origin manner. It has been observed in mammals and marsupials, but not in birds. Until now, only a few genes orthologous to mammalian imprinted ones have been analyzed in chicken and did not demonstrate any evidence of imprinting in this species. However, several published observations such as imprinted-like QTL in poultry or reciprocal effects keep the question open. Our main objective was thus to screen the entire chicken genome for parental-allele-specific differential expression on whole embryonic transcriptomes, using high-throughput sequencing. To identify the parental origin of each observed haplotype, two chicken experimental populations were used, as inbred and as genetically distant as possible. Two families were produced from two reciprocal crosses. Transcripts from 20 embryos were sequenced using NGS technology, producing ∼200 Gb of sequences. This allowed the detection of 79 potentially imprinted SNPs, through an analysis method that we validated by detecting imprinting from mouse data already published. However, out of 23 candidates tested by pyrosequencing, none could be confirmed. These results come together, without a priori, with previous statements and phylogenetic considerations assessing the absence of genomic imprinting in chicken. PMID:24452801

  11. Transcriptome-wide investigation of genomic imprinting in chicken.

    PubMed

    Frésard, Laure; Leroux, Sophie; Servin, Bertrand; Gourichon, David; Dehais, Patrice; Cristobal, Magali San; Marsaud, Nathalie; Vignoles, Florence; Bed'hom, Bertrand; Coville, Jean-Luc; Hormozdiari, Farhad; Beaumont, Catherine; Zerjal, Tatiana; Vignal, Alain; Morisson, Mireille; Lagarrigue, Sandrine; Pitel, Frédérique

    2014-04-01

    Genomic imprinting is an epigenetic mechanism by which alleles of some specific genes are expressed in a parent-of-origin manner. It has been observed in mammals and marsupials, but not in birds. Until now, only a few genes orthologous to mammalian imprinted ones have been analyzed in chicken and did not demonstrate any evidence of imprinting in this species. However, several published observations such as imprinted-like QTL in poultry or reciprocal effects keep the question open. Our main objective was thus to screen the entire chicken genome for parental-allele-specific differential expression on whole embryonic transcriptomes, using high-throughput sequencing. To identify the parental origin of each observed haplotype, two chicken experimental populations were used, as inbred and as genetically distant as possible. Two families were produced from two reciprocal crosses. Transcripts from 20 embryos were sequenced using NGS technology, producing ∼200 Gb of sequences. This allowed the detection of 79 potentially imprinted SNPs, through an analysis method that we validated by detecting imprinting from mouse data already published. However, out of 23 candidates tested by pyrosequencing, none could be confirmed. These results come together, without a priori, with previous statements and phylogenetic considerations assessing the absence of genomic imprinting in chicken.

  12. Comparative whole genome transcriptome and metabolome analyses of five Klebsiella pneumonia strains.

    PubMed

    Lee, Soojin; Kim, Borim; Yang, Jeongmo; Jeong, Daun; Park, Soohyun; Shin, Sang Heum; Kook, Jun Ho; Yang, Kap-Seok; Lee, Jinwon

    2015-11-01

    The integration of transcriptomics and metabolomics can provide precise information on gene-to-metabolite networks for identifying the function of novel genes. The goal of this study was to identify novel gene functions involved in 2,3-butanediol (2,3-BDO) biosynthesis by a comprehensive analysis of the transcriptome and metabolome of five mutated Klebsiella pneumonia strains (∆wabG = SGSB100, ∆wabG∆budA = SGSB106, ∆wabG∆budB = SGSB107, ∆wabG∆budC = SGSB108, ∆wabG∆budABC = SGSB109). First, the transcriptomes of all five mutants were analyzed and the genes exhibiting reproducible changes in expression were determined. The transcriptome was well conserved among the five strains, and differences in gene expression occurred mainly in genes coding for 2,3-BDO biosynthesis (budA, budB, and budC) and the genes involved in the degradation of reactive oxygen, biosynthesis and transport of arginine, cysteine biosynthesis, sulfur metabolism, oxidoreductase reaction, and formate dehydrogenase reaction. Second, differences in the metabolome (estimated by carbon distribution, CO2 emission, and redox balance) among the five mutant strains due to gene alteration of the 2,3-BDO operon were detected. The functional genomics approach integrating metabolomics and transcriptomics in K. Pneumonia presented here provides an innovative means of identifying novel gene functions involved in 2,3-BDO biosynthesis metabolism and whole cell metabolism.

  13. Identification of hypertension-related genes through an integrated genomic-transcriptomic approach.

    PubMed

    Yagil, Chana; Hubner, Norbert; Monti, Jan; Schulz, Herbert; Sapojnikov, Marina; Luft, Friedrich C; Ganten, Detlev; Yagil, Yoram

    2005-04-01

    In search for the genetic basis of hypertension, we applied an integrated genomic-transcriptomic approach to identify genes involved in the pathogenesis of hypertension in the Sabra rat model of salt-susceptibility. In the genomic arm of the project, we previously detected in male rats two salt-susceptibility QTLs on chromosome 1, SS1a (D1Mgh2-D1Mit11; span 43.1 cM) and SS1b (D1Mit11-D1Mit4; span 18 cM). In the transcriptomic arm, we studied differential gene expression in kidneys of SBH/y and SBN/y rats that had been fed regular diet or salt-loaded. We used the Affymetrix Rat Genome RAE230 GeneChip and probed >30,000 transcripts. The research algorithm called for an initial genome-wide screen for differentially expressed transcripts between the study groups. This step was followed by cluster analysis based on 2x2 ANOVA to identify transcripts that were of relevance specifically to salt-sensitivity and hypertension and to salt-resistance. The two arms of the project were integrated by identifying those differentially expressed transcripts that showed an allele-specific hypertensive effect on salt-loading and that mapped within the defined boundaries of the salt-susceptibility QTLs on chromosome 1. The differentially expressed transcripts were confirmed by RT-PCR. Of the 2933 genes annotated to rat chromosome 1, 1102 genes were identified within the boundaries of the two blood pressure QTLs. The microarray identified 2470 transcripts that were differentially expressed between the study groups. Cluster analysis identified genome-wide 192 genes that were relevant to salt-susceptibility and/or hypertension, 19 of which mapped to chromosome 1. Eight of these genes mapped within the boundaries of QTLs SS1a and SS1b. RT-PCR confirmed 7 genes, leaving TcTex1, Myadm, Lisch7, Axl-like, Fah, PRC1-like, and Serpinh1. None of these genes has been implicated in hypertension before. These genes become henceforth targets for our continuing search for the genetic basis of

  14. Capturing the response of Clostridium acetobutylicum to chemical stressors using a regulated genome-scale metabolic model

    DOE PAGES

    Dash, Satyakam; Mueller, Thomas J.; Venkataramanan, Keerthi P.; ...

    2014-10-14

    Clostridia are anaerobic Gram-positive Firmicutes containing broad and flexible systems for substrate utilization, which have been used successfully to produce a range of industrial compounds. Clostridium acetobutylicum has been used to produce butanol on an industrial scale through acetone-butanol-ethanol (ABE) fermentation. A genome-scale metabolic (GSM) model is a powerful tool for understanding the metabolic capacities of an organism and developing metabolic engineering strategies for strain development. The integration of stress related specific transcriptomics information with the GSM model provides opportunities for elucidating the focal points of regulation.

  15. The Genomic and Transcriptomic Landscape of a HeLa Cell Line

    PubMed Central

    Landry, Jonathan J. M.; Pyl, Paul Theodor; Rausch, Tobias; Zichner, Thomas; Tekkedil, Manu M.; Stütz, Adrian M.; Jauch, Anna; Aiyar, Raeka S.; Pau, Gregoire; Delhomme, Nicolas; Gagneur, Julien; Korbel, Jan O.; Huber, Wolfgang; Steinmetz, Lars M.

    2013-01-01

    HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology. PMID:23550136

  16. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data.

    PubMed

    Chan, Kuang-Lim; Rosli, Rozana; Tatarinova, Tatiana V; Hogan, Michael; Firdaus-Raih, Mohd; Low, Eng-Ti Leslie

    2017-01-27

    Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion. We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure). Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

  17. The Genome and Development-Dependent Transcriptomes of Pyronema confluens: A Window into Fungal Evolution

    PubMed Central

    Traeger, Stefanie; Altegoer, Florian; Freitag, Michael; Gabaldon, Toni; Kempken, Frank; Kumar, Abhishek; Marcet-Houben, Marina; Pöggeler, Stefanie; Stajich, Jason E.; Nowrousian, Minou

    2013-01-01

    Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ∼13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion

  18. Plant Aquaporins: Genome-Wide Identification, Transcriptomics, Proteomics, and Advanced Analytical Tools.

    PubMed

    Deshmukh, Rupesh K; Sonah, Humira; Bélanger, Richard R

    2016-01-01

    Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is

  19. Plant Aquaporins: Genome-Wide Identification, Transcriptomics, Proteomics, and Advanced Analytical Tools

    PubMed Central

    Deshmukh, Rupesh K.; Sonah, Humira; Bélanger, Richard R.

    2016-01-01

    Aquaporins (AQPs) are channel-forming integral membrane proteins that facilitate the movement of water and many other small molecules. Compared to animals, plants contain a much higher number of AQPs in their genome. Homology-based identification of AQPs in sequenced species is feasible because of the high level of conservation of protein sequences across plant species. Genome-wide characterization of AQPs has highlighted several important aspects such as distribution, genetic organization, evolution and conserved features governing solute specificity. From a functional point of view, the understanding of AQP transport system has expanded rapidly with the help of transcriptomics and proteomics data. The efficient analysis of enormous amounts of data generated through omic scale studies has been facilitated through computational advancements. Prediction of protein tertiary structures, pore architecture, cavities, phosphorylation sites, heterodimerization, and co-expression networks has become more sophisticated and accurate with increasing computational tools and pipelines. However, the effectiveness of computational approaches is based on the understanding of physiological and biochemical properties, transport kinetics, solute specificity, molecular interactions, sequence variations, phylogeny and evolution of aquaporins. For this purpose, tools like Xenopus oocyte assays, yeast expression systems, artificial proteoliposomes, and lipid membranes have been efficiently exploited to study the many facets that influence solute transport by AQPs. In the present review, we discuss genome-wide identification of AQPs in plants in relation with recent advancements in analytical tools, and their availability and technological challenges as they apply to AQPs. An exhaustive review of omics resources available for AQP research is also provided in order to optimize their efficient utilization. Finally, a detailed catalog of computational tools and analytical pipelines is

  20. A Universal Genome Array and Transcriptome Atlas for Brachypodium Distachyon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mockler, Todd

    Brachypodium distachyon is the premier experimental model grass platform and is related to candidate feedstock crops for bioethanol production. Based on the DOE-JGI Brachypodium Bd21 genome sequence and annotation we designed a whole genome DNA microarray platform. The quality of this array platform is unprecedented due to the exceptional quality of the Brachypodium genome assembly and annotation and the stringent probe selection criteria employed in the design. We worked with members of the international community and the bioinformatics/design team at Affymetrix at all stages in the development of the array. We used the Brachypodium arrays to interrogate the transcriptomes ofmore » plants grown in a variety of environmental conditions including diurnal and circadian light/temperature conditions and under a variety of environmental conditions. We examined the transciptional responses of Brachypodium seedlings subjected to various abiotic stresses including heat, cold, salt, and high intensity light. We generated a gene expression atlas representing various organs and developmental stages. The results of these efforts including all microarray datasets are published and available at online public databases.« less

  1. Transcriptome and metabolome of synthetic Solanum autotetraploids reveal key genomic stress events following polyploidization.

    PubMed

    Fasano, Carlo; Diretto, Gianfranco; Aversano, Riccardo; D'Agostino, Nunzio; Di Matteo, Antonio; Frusciante, Luigi; Giuliano, Giovanni; Carputo, Domenico

    2016-06-01

    Polyploids are generally classified as autopolyploids, derived from a single species, and allopolyploids, arising from interspecific hybridization. The former represent ideal materials with which to study the consequences of genome doubling and ascertain whether there are molecular and functional rules operating following polyploidization events. To investigate whether the effects of autopolyploidization are common to different species, or if species-specific or stochastic events are prevalent, we performed a comprehensive transcriptomic and metabolomic characterization of diploids and autotetraploids of Solanum commersonii and Solanum bulbocastanum. Autopolyploidization remodelled the transcriptome and the metabolome of both species. In S. commersonii, differentially expressed genes (DEGs) were highly enriched in pericentromeric regions. Most changes were stochastic, suggesting a strong genotypic response. However, a set of robustly regulated transcripts and metabolites was also detected, including purine bases and nucleosides, which are likely to underlie a common response to polyploidization. We hypothesize that autopolyploidization results in nucleotide pool imbalance, which in turn triggers a genomic shock responsible for the stochastic events observed. The more extensive genomic stress and the higher number of stochastic events observed in S. commersonii with respect to S. bulbocastanum could be the result of the higher nucleoside depletion observed in this species. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  2. The genome and transcriptome of the enteric parasite Entamoeba invadens, a model for encystation

    PubMed Central

    2013-01-01

    Background Several eukaryotic parasites form cysts that transmit infection. The process is found in diverse organisms such as Toxoplasma, Giardia, and nematodes. In Entamoeba histolytica this process cannot be induced in vitro, making it difficult to study. In Entamoeba invadens, stage conversion can be induced, but its utility as a model system to study developmental biology has been limited by a lack of genomic resources. We carried out genome and transcriptome sequencing of E. invadens to identify molecular processes involved in stage conversion. Results We report the sequencing and assembly of the E. invadens genome and use whole transcriptome sequencing to characterize changes in gene expression during encystation and excystation. The E. invadens genome is larger than that of E. histolytica, apparently largely due to expansion of intergenic regions; overall gene number and the machinery for gene regulation are conserved between the species. Over half the genes are regulated during the switch between morphological forms and a key signaling molecule, phospholipase D, appears to regulate encystation. We provide evidence for the occurrence of meiosis during encystation, suggesting that stage conversion may play a key role in recombination between strains. Conclusions Our analysis demonstrates that a number of core processes are common to encystation between distantly related parasites, including meiosis, lipid signaling and RNA modification. These data provide a foundation for understanding the developmental cascade in the important human pathogen E. histolytica and highlight conserved processes more widely relevant in enteric pathogens. PMID:23889909

  3. Transcriptome profiling of the demosponge Amphimedon queenslandica reveals genome-wide events that accompany major life cycle transitions

    PubMed Central

    2012-01-01

    Background The biphasic life cycle with pelagic larva and benthic adult stages is widely observed in the animal kingdom, including the Porifera (sponges), which are the earliest branching metazoans. The demosponge, Amphimedon queenslandica, undergoes metamorphosis from a free-swimming larva into a sessile adult that bears no morphological resemblance to other animals. While the genome of A. queenslandica contains an extensive repertoire of genes very similar to that of complex bilaterians, it is as yet unclear how this is drawn upon to coordinate changing morphological features and ecological demands throughout the sponge life cycle. Results To identify genome-wide events that accompany the pelagobenthic transition in A. queenslandica, we compared global gene expression profiles at four key developmental stages by sequencing the poly(A) transcriptome using SOLiD technology. Large-scale changes in transcription were observed as sponge larvae settled on the benthos and began metamorphosis. Although previous systematics suggest that the only clear homology between Porifera and other animals is in the embryonic and larval stages, we observed extensive use of genes involved in metazoan-associated cellular processes throughout the sponge life cycle. Sponge-specific transcripts are not over-represented in the morphologically distinct adult; rather, many genes that encode typical metazoan features, such as cell adhesion and immunity, are upregulated. Our analysis further revealed gene families with candidate roles in competence, settlement, and metamorphosis in the sponge, including transcription factors, G-protein coupled receptors and other signaling molecules. Conclusions This first genome-wide study of the developmental transcriptome in an early branching metazoan highlights major transcriptional events that accompany the pelagobenthic transition and point to a network of regulatory mechanisms that coordinate changes in morphology with shifting environmental demands

  4. Quantifying whole transcriptome size, a prerequisite for understanding transcriptome evolution across species: an example from a plant allopolyploid.

    PubMed

    Coate, Jeremy E; Doyle, Jeff J

    2010-01-01

    Evolutionary biologists are increasingly comparing gene expression patterns across species. Due to the way in which expression assays are normalized, such studies provide no direct information about expression per gene copy (dosage responses) or per cell and can give a misleading picture of genes that are differentially expressed. We describe an assay for estimating relative expression per cell. When used in conjunction with transcript profiling data, it is possible to compare the sizes of whole transcriptomes, which in turn makes it possible to compare expression per cell for each gene in the transcript profiling data set. We applied this approach, using quantitative reverse transcriptase-polymerase chain reaction and high throughput RNA sequencing, to a recently formed allopolyploid and showed that its leaf transcriptome was approximately 1.4-fold larger than either progenitor transcriptome (70% of the sum of the progenitor transcriptomes). In contrast, the allopolyploid genome is 94.3% as large as the sum of its progenitor genomes and retains > or =93.5% of the sum of its progenitor gene complements. Thus, "transcriptome downsizing" is greater than genome downsizing. Using this transcriptome size estimate, we inferred dosage responses for several thousand genes and showed that the majority exhibit partial dosage compensation. Homoeologue silencing is nonrandomly distributed across dosage responses, with genes showing extreme responses in either direction significantly more likely to have a silent homoeologue. This experimental approach will add value to transcript profiling experiments involving interspecies and interploidy comparisons by converting expression per transcriptome to expression per genome, eliminating the need for assumptions about transcriptome size.

  5. The Draft Genome and Transcriptome of Amaranthus hypochondriacus: A C4 Dicot Producing High-Lysine Edible Pseudo-Cereal

    PubMed Central

    Sunil, Meeta; Hariharan, Arun K.; Nayak, Soumya; Gupta, Saurabh; Nambisan, Suran R.; Gupta, Ravi P.; Panda, Binay; Choudhary, Bibha; Srinivasan, Subhashini

    2014-01-01

    Grain amaranths, edible C4 dicots, produce pseudo-cereals high in lysine. Lysine being one of the most limiting essential amino acids in cereals and C4 photosynthesis being one of the most sought-after phenotypes in protein-rich legume crops, the genome of one of the grain amaranths is likely to play a critical role in crop research. We have sequenced the genome and transcriptome of Amaranthus hypochondriacus, a diploid (2n = 32) belonging to the order Caryophyllales with an estimated genome size of 466 Mb. Of the 411 linkage single-nucleotide polymorphisms (SNPs) reported for grain amaranths, 355 SNPs (86%) are represented in the scaffolds and 74% of the 8.6 billion bases of the sequenced transcriptome map to the genomic scaffolds. The genome of A. hypochondriacus, codes for at least 24,829 proteins, shares the paleohexaploidy event with species under the superorders Rosids and Asterids, harbours 1 SNP in 1,000 bases, and contains 13.76% of repeat elements. Annotation of all the genes in the lysine biosynthetic pathway using comparative genomics and expression analysis offers insights into the high-lysine phenotype. As the first grain species under Caryophyllales and the first C4 dicot genome reported, the work presented here will be beneficial in improving crops and in expanding our understanding of angiosperm evolution. PMID:25071079

  6. Genomic, transcriptomic and phenomic variation reveals the complex adaptation to stress response of modern maize breeding

    USDA-ARS?s Scientific Manuscript database

    Early maize adaptation to different agricultural environments was an important process associated with the creation of a stable food supply that allowed the evolution of human civilization in the Americas. To explore the mechanisms of maize adaptation, genomic, transcriptomic and phenomic data were ...

  7. Lentinula edodes Genome Survey and Postharvest Transcriptome Analysis.

    PubMed

    Sakamoto, Yuichi; Nakade, Keiko; Sato, Shiho; Yoshida, Kentaro; Miyazaki, Kazuhiro; Natsume, Satoshi; Konno, Naotake

    2017-05-15

    Lentinula edodes is a popular, cultivated edible and medicinal mushroom. Lentinula edodes is susceptible to postharvest problems, such as gill browning, fruiting body softening, and lentinan degradation. We constructed a de novo assembly draft genome sequence and performed gene prediction for Lentinula edodes De novo assembly was carried out using short reads from paired-end and mate-paired libraries and by using long reads by PacBio, resulting in a contig number of 1,951 and an N 50 of 1 Mb. Furthermore, we predicted genes by Augustus using transcriptome sequencing (RNA-seq) data from the whole life cycle of Lentinula edodes , resulting in 12,959 predicted genes. This analysis revealed that Lentinula edodes lacks lignin peroxidase. To reveal genes involved in the loss of quality of Lentinula edodes postharvest fruiting bodies, transcriptome analysis was carried out using serial analysis of gene expression (SuperSAGE). This analysis revealed that many cell wall-related enzymes are upregulated after harvest, such as β-1,3-1,6-glucan-degrading enzymes in glycoside hydrolase (GH) families GH5, GH16, GH30, GH55, and GH128, and thaumatin-like proteins. In addition, we found that several chitin-related genes are upregulated, such as putative chitinases in GH family 18, exochitinases in GH20, and a putative chitosanase in GH family 75. The results suggest that cell wall-degrading enzymes synergistically cooperate for rapid fruiting body autolysis. Many putative transcription factor genes were upregulated postharvest, such as genes containing high-mobility-group (HMG) domains and zinc finger domains. Several cell death-related proteins were also upregulated postharvest. IMPORTANCE Our data collectively suggest that there is a rapid fruiting body autolysis system in Lentinula edodes The genes for the loss of postharvest quality newly found in this research will be targets for the future breeding of strains that keep fresh longer than present strains. De novo Lentinula

  8. Lentinula edodes Genome Survey and Postharvest Transcriptome Analysis

    PubMed Central

    Nakade, Keiko; Sato, Shiho; Yoshida, Kentaro; Miyazaki, Kazuhiro; Natsume, Satoshi; Konno, Naotake

    2017-01-01

    ABSTRACT Lentinula edodes is a popular, cultivated edible and medicinal mushroom. Lentinula edodes is susceptible to postharvest problems, such as gill browning, fruiting body softening, and lentinan degradation. We constructed a de novo assembly draft genome sequence and performed gene prediction for Lentinula edodes. De novo assembly was carried out using short reads from paired-end and mate-paired libraries and by using long reads by PacBio, resulting in a contig number of 1,951 and an N50 of 1 Mb. Furthermore, we predicted genes by Augustus using transcriptome sequencing (RNA-seq) data from the whole life cycle of Lentinula edodes, resulting in 12,959 predicted genes. This analysis revealed that Lentinula edodes lacks lignin peroxidase. To reveal genes involved in the loss of quality of Lentinula edodes postharvest fruiting bodies, transcriptome analysis was carried out using serial analysis of gene expression (SuperSAGE). This analysis revealed that many cell wall-related enzymes are upregulated after harvest, such as β-1,3-1,6-glucan-degrading enzymes in glycoside hydrolase (GH) families GH5, GH16, GH30, GH55, and GH128, and thaumatin-like proteins. In addition, we found that several chitin-related genes are upregulated, such as putative chitinases in GH family 18, exochitinases in GH20, and a putative chitosanase in GH family 75. The results suggest that cell wall-degrading enzymes synergistically cooperate for rapid fruiting body autolysis. Many putative transcription factor genes were upregulated postharvest, such as genes containing high-mobility-group (HMG) domains and zinc finger domains. Several cell death-related proteins were also upregulated postharvest. IMPORTANCE Our data collectively suggest that there is a rapid fruiting body autolysis system in Lentinula edodes. The genes for the loss of postharvest quality newly found in this research will be targets for the future breeding of strains that keep fresh longer than present strains. De novo

  9. Combined Analysis of the Chloroplast Genome and Transcriptome of the Antarctic Vascular Plant Deschampsia antarctica Desv

    PubMed Central

    Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2014-01-01

    Background Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. Results The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. Conclusions We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the

  10. Combined analysis of the chloroplast genome and transcriptome of the Antarctic vascular plant Deschampsia antarctica Desv.

    PubMed

    Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok

    2014-01-01

    Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5'- or 3'-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast

  11. Genomic and transcriptomic insights into the efficient entomopathogenicity of Bacillus thuringiensis.

    PubMed

    Zhu, Lei; Peng, Donghai; Wang, Yueying; Ye, Weixing; Zheng, Jinshui; Zhao, Changming; Han, Dongmei; Geng, Ce; Ruan, Lifang; He, Jin; Yu, Ziniu; Sun, Ming

    2015-09-28

    Bacillus thuringiensis has been globally used as a microbial pesticide for over 70 years. However, information regarding its various adaptions and virulence factors and their roles in the entomopathogenic process remains limited. In this work, we present the complete genomes of two industrially patented Bacillus thuringiensis strains (HD-1 and YBT-1520). A comparative genomic analysis showed a larger and more complicated genome constitution that included novel insecticidal toxicity-related genes (ITRGs). All of the putative ITRGs were summarized according to the steps of infection. A comparative genomic analysis showed that highly toxic strains contained significantly more ITRGs, thereby providing additional strategies for infection, immune evasion, and cadaver utilization. Furthermore, a comparative transcriptomic analysis suggested that a high expression of these ITRGs was a key factor in efficient entomopathogenicity. We identified an active extra urease synthesis system in the highly toxic strains that may aid B. thuringiensis survival in insects (similar to previous results with well-known pathogens). Taken together, these results explain the efficient entomopathogenicity of B. thuringiensis. It provides novel insights into the strategies used by B. thuringiensis to resist and overcome host immune defenses and helps identify novel toxicity factors.

  12. Genomic and transcriptomic insights into the efficient entomopathogenicity of Bacillus thuringiensis

    PubMed Central

    Zhu, Lei; Peng, Donghai; Wang, Yueying; Ye, Weixing; Zheng, Jinshui; Zhao, Changming; Han, Dongmei; Geng, Ce; Ruan, Lifang; He, Jin; Yu, Ziniu; Sun, Ming

    2015-01-01

    Bacillus thuringiensis has been globally used as a microbial pesticide for over 70 years. However, information regarding its various adaptions and virulence factors and their roles in the entomopathogenic process remains limited. In this work, we present the complete genomes of two industrially patented Bacillus thuringiensis strains (HD-1 and YBT-1520). A comparative genomic analysis showed a larger and more complicated genome constitution that included novel insecticidal toxicity-related genes (ITRGs). All of the putative ITRGs were summarized according to the steps of infection. A comparative genomic analysis showed that highly toxic strains contained significantly more ITRGs, thereby providing additional strategies for infection, immune evasion, and cadaver utilization. Furthermore, a comparative transcriptomic analysis suggested that a high expression of these ITRGs was a key factor in efficient entomopathogenicity. We identified an active extra urease synthesis system in the highly toxic strains that may aid B. thuringiensis survival in insects (similar to previous results with well-known pathogens). Taken together, these results explain the efficient entomopathogenicity of B. thuringiensis. It provides novel insights into the strategies used by B. thuringiensis to resist and overcome host immune defenses and helps identify novel toxicity factors. PMID:26411888

  13. Genome-scale cold stress response regulatory networks in ten Arabidopsis thaliana ecotypes

    PubMed Central

    2013-01-01

    Background Low temperature leads to major crop losses every year. Although several studies have been conducted focusing on diversity of cold tolerance level in multiple phenotypically divergent Arabidopsis thaliana (A. thaliana) ecotypes, genome-scale molecular understanding is still lacking. Results In this study, we report genome-scale transcript response diversity of 10 A. thaliana ecotypes originating from different geographical locations to non-freezing cold stress (10°C). To analyze the transcriptional response diversity, we initially compared transcriptome changes in all 10 ecotypes using Arabidopsis NimbleGen ATH6 microarrays. In total 6061 transcripts were significantly cold regulated (p < 0.01) in 10 ecotypes, including 498 transcription factors and 315 transposable elements. The majority of the transcripts (75%) showed ecotype specific expression pattern. By using sequence data available from Arabidopsis thaliana 1001 genome project, we further investigated sequence polymorphisms in the core cold stress regulon genes. Significant numbers of non-synonymous amino acid changes were observed in the coding region of the CBF regulon genes. Considering the limited knowledge about regulatory interactions between transcription factors and their target genes in the model plant A. thaliana, we have adopted a powerful systems genetics approach- Network Component Analysis (NCA) to construct an in-silico transcriptional regulatory network model during response to cold stress. The resulting regulatory network contained 1,275 nodes and 7,720 connections, with 178 transcription factors and 1,331 target genes. Conclusions A. thaliana ecotypes exhibit considerable variation in transcriptome level responses to non-freezing cold stress treatment. Ecotype specific transcripts and related gene ontology (GO) categories were identified to delineate natural variation of cold stress regulated differential gene expression in the model plant A. thaliana. The predicted

  14. A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum

    PubMed Central

    2014-01-01

    Background Tuber melanosporum, also known in the gastronomic community as “truffle”, features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. Findings We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody (“truffle”), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. Conclusions The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles. PMID:25392735

  15. A comprehensive resource of genomic, epigenomic and transcriptomic sequencing data for the black truffle Tuber melanosporum.

    PubMed

    Chen, Pao-Yang; Montanini, Barbara; Liao, Wen-Wei; Morselli, Marco; Jaroszewicz, Artur; Lopez, David; Ottonello, Simone; Pellegrini, Matteo

    2014-01-01

    Tuber melanosporum, also known in the gastronomic community as "truffle", features one of the largest fungal genomes (125 Mb) with an exceptionally high transposable element (TE) and repetitive DNA content (>58%). The main purpose of DNA methylation in fungi is TE silencing. As obligate outcrossing organisms, truffles are bound to a sexual mode of propagation, which together with TEs is thought to represent a major force driving the evolution of DNA methylation. Thus, it was of interest to examine if and how T. melanosporum exploits DNA methylation to maintain genome integrity. We performed whole-genome DNA bisulfite sequencing and mRNA sequencing on different developmental stages of T. melanosporum; namely, fruitbody ("truffle"), free-living mycelium and ectomycorrhiza. The data revealed a high rate of cytosine methylation (>44%), selectively targeting TEs rather than genes with a strong preference for CpG sites. Whole genome DNA sequencing uncovered multiple TE-enriched, copy number variant regions bearing a significant fraction of hypomethylated and expressed TEs, almost exclusively in free-living mycelium propagated in vitro. Treatment of mycelia with 5-azacytidine partially reduced DNA methylation and increased TE transcription. Our transcriptome assembly also resulted in the identification of a set of novel transcripts from 614 genes. The datasets presented here provide valuable and comprehensive (epi)genomic information that can be of interest for evolutionary genomics studies of multicellular (filamentous) fungi, in particular Ascomycetes belonging to the subphylum, Pezizomycotina. Evidence derived from comparative methylome and transcriptome analyses indicates that a non-exhaustive and partly reversible methylation process operates in truffles.

  16. Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

    PubMed

    Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

    2015-02-10

    Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.

  17. Large-scale transcriptome characterization and mass discovery of SNPs in globe artichoke and its related taxa.

    PubMed

    Scaglione, Davide; Lanteri, Sergio; Acquadro, Alberto; Lai, Zhao; Knapp, Steven J; Rieseberg, Loren; Portis, Ezio

    2012-10-01

    Cynara cardunculus (2n = 2× = 34) is a member of the Asteraceae family that contributes significantly to the agricultural economy of the Mediterranean basin. The species includes two cultivated varieties, globe artichoke and cardoon, which are grown mainly for food. Cynara cardunculus is an orphan crop species whose genome/transcriptome has been relatively unexplored, especially in comparison to other Asteraceae crops. Hence, there is a significant need to improve its genomic resources through the identification of novel genes and sequence-based markers, to design new breeding schemes aimed at increasing quality and crop productivity. We report the outcome of cDNA sequencing and assembly for eleven accessions of C. cardunculus. Sequencing of three mapping parental genotypes using Roche 454-Titanium technology generated 1.7 × 10⁶ reads, which were assembled into 38,726 reference transcripts covering 32 Mbp. Putative enzyme-encoding genes were annotated using the KEGG-database. Transcription factors and candidate resistance genes were surveyed as well. Paired-end sequencing was done for cDNA libraries of eight other representative C. cardunculus accessions on an Illumina Genome Analyzer IIx, generating 46 × 10⁶ reads. Alignment of the IGA and 454 reads to reference transcripts led to the identification of 195,400 SNPs with a Bayesian probability exceeding 95%; a validation rate of 90% was obtained by Sanger-sequencing of a subset of contigs. These results demonstrate that the integration of data from different NGS platforms enables large-scale transcriptome characterization, along with massive SNP discovery. This information will contribute to the dissection of key agricultural traits in C. cardunculus and facilitate the implementation of marker-assisted selection programs. © 2012 The Authors. Plant Biotechnology Journal © 2012 Society for Experimental Biology, Association of Applied Biologists and Blackwell Publishing Ltd.

  18. Genomic and Transcriptomic Associations Identify a New Insecticide Resistance Phenotype for the Selective Sweep at the Cyp6g1 Locus of Drosophila melanogaster.

    PubMed

    Battlay, Paul; Schmidt, Joshua M; Fournier-Level, Alexandre; Robin, Charles

    2016-08-09

    Scans of the Drosophila melanogaster genome have identified organophosphate resistance loci among those with the most pronounced signature of positive selection. In this study, the molecular basis of resistance to the organophosphate insecticide azinphos-methyl was investigated using the Drosophila Genetic Reference Panel, and genome-wide association. Recently released full transcriptome data were used to extend the utility of the Drosophila Genetic Reference Panel resource beyond traditional genome-wide association studies to allow systems genetics analyses of phenotypes. We found that both genomic and transcriptomic associations independently identified Cyp6g1, a gene involved in resistance to DDT and neonicotinoid insecticides, as the top candidate for azinphos-methyl resistance. This was verified by transgenically overexpressing Cyp6g1 using natural regulatory elements from a resistant allele, resulting in a 6.5-fold increase in resistance. We also identified four novel candidate genes associated with azinphos-methyl resistance, all of which are involved in either regulation of fat storage, or nervous system development. In Cyp6g1, we find a demonstrable resistance locus, a verification that transcriptome data can be used to identify variants associated with insecticide resistance, and an overlap between peaks of a genome-wide association study, and a genome-wide selective sweep analysis. Copyright © 2016 Battlay et al.

  19. Optimizing de novo transcriptome assembly and extending genomic resources for striped catfish (Pangasianodon hypophthalmus).

    PubMed

    Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Njaci, Isaac; Yoon, Byoung-Ha; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter

    2015-10-01

    Striped catfish (Pangasianodon hypophthalmus) is a commercially important freshwater fish used in inland aquaculture in the Mekong Delta, Vietnam. The culture industry is facing a significant challenge however from saltwater intrusion into many low topographical coastal provinces across the Mekong Delta as a result of predicted climate change impacts. Developing genomic resources for this species can facilitate the production of improved culture lines that can withstand raised salinity conditions, and so we have applied high-throughput Ion Torrent sequencing of transcriptome libraries from six target osmoregulatory organs from striped catfish as a genomic resource for use in future selection strategies. We obtained 12,177,770 reads after trimming and processing with an average length of 97bp. De novo assemblies were generated using CLC Genomic Workbench, Trinity and Velvet/Oases with the best overall contig performance resulting from the CLC assembly. De novo assembly using CLC yielded 66,451 contigs with an average length of 478bp and N50 length of 506bp. A total of 37,969 contigs (57%) possessed significant similarity with proteins in the non-redundant database. Comparative analyses revealed that a significant number of contigs matched sequences reported in other teleost fishes, ranging in similarity from 45.2% with Atlantic cod to 52% with zebrafish. In addition, 28,879 simple sequence repeats (SSRs) and 55,721 single nucleotide polymorphisms (SNPs) were detected in the striped catfish transcriptome. The sequence collection generated in the current study represents the most comprehensive genomic resource for P. hypophthalmus available to date. Our results illustrate the utility of next-generation sequencing as an efficient tool for constructing a large genomic database for marker development in non-model species. Copyright © 2015 Elsevier B.V. All rights reserved.

  20. Proteomics informed by transcriptomics for characterising active transposable elements and genome annotation in Aedes aegypti.

    PubMed

    Maringer, Kevin; Yousuf, Amjad; Heesom, Kate J; Fan, Jun; Lee, David; Fernandez-Sesma, Ana; Bessant, Conrad; Matthews, David A; Davidson, Andrew D

    2017-01-19

    Aedes aegypti is a vector for the (re-)emerging human pathogens dengue, chikungunya, yellow fever and Zika viruses. Almost half of the Ae. aegypti genome is comprised of transposable elements (TEs). Transposons have been linked to diverse cellular processes, including the establishment of viral persistence in insects, an essential step in the transmission of vector-borne viruses. However, up until now it has not been possible to study the overall proteome derived from an organism's mobile genetic elements, partly due to the highly divergent nature of TEs. Furthermore, as for many non-model organisms, incomplete genome annotation has hampered proteomic studies on Ae. aegypti. We analysed the Ae. aegypti proteome using our new proteomics informed by transcriptomics (PIT) technique, which bypasses the need for genome annotation by identifying proteins through matched transcriptomic (rather than genomic) data. Our data vastly increase the number of experimentally confirmed Ae. aegypti proteins. The PIT analysis also identified hotspots of incomplete genome annotation, and showed that poor sequence and assembly quality do not explain all annotation gaps. Finally, in a proof-of-principle study, we developed criteria for the characterisation of proteomically active TEs. Protein expression did not correlate with a TE's genomic abundance at different levels of classification. Most notably, long terminal repeat (LTR) retrotransposons were markedly enriched compared to other elements. PIT was superior to 'conventional' proteomic approaches in both our transposon and genome annotation analyses. We present the first proteomic characterisation of an organism's repertoire of mobile genetic elements, which will open new avenues of research into the function of transposon proteins in health and disease. Furthermore, our study provides a proof-of-concept that PIT can be used to evaluate a genome's annotation to guide annotation efforts which has the potential to improve the

  1. Genome-wide investigation and transcriptome analysis of the WRKY gene family in Gossypium.

    PubMed

    Ding, Mingquan; Chen, Jiadong; Jiang, Yurong; Lin, Lifeng; Cao, YueFen; Wang, Minhua; Zhang, Yuting; Rong, Junkang; Ye, Wuwei

    2015-02-01

    WRKY transcription factors play important roles in various stress responses in diverse plant species. In cotton, this family has not been well studied, especially in relation to fiber development. Here, the genomes and transcriptomes of Gossypium raimondii and Gossypium arboreum were investigated to identify fiber development related WRKY genes. This represents the first comprehensive comparative study of WRKY transcription factors in both diploid A and D cotton species. In total, 112 G. raimondii and 109 G. arboreum WRKY genes were identified. No significant gene structure or domain alterations were detected between the two species, but many SNPs distributed unequally in exon and intron regions. Physical mapping revealed that the WRKY genes in G. arboreum were not located in the corresponding chromosomes of G. raimondii, suggesting great chromosome rearrangement in the diploid cotton genomes. The cotton WRKY genes, especially subgroups I and II, have expanded through multiple whole genome duplications and tandem duplications compared with other plant species. Sequence comparison showed many functionally divergent sites between WRKY subgroups, while the genes within each group are under strong purifying selection. Transcriptome analysis suggested that many WRKY genes participate in specific fiber development processes such as fiber initiation, elongation and maturation with different expression patterns between species. Complex WRKY gene expression such as differential Dt and At allelic gene expression in G. hirsutum and alternative splicing events were also observed in both diploid and tetraploid cottons during fiber development process. In conclusion, this study provides important information on the evolution and function of WRKY gene family in cotton species.

  2. International network of cancer genome projects

    PubMed Central

    2010-01-01

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumors from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of over 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically-relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies. PMID:20393554

  3. Single-cell transcriptomics for microbial eukaryotes.

    PubMed

    Kolisko, Martin; Boscaro, Vittorio; Burki, Fabien; Lynn, Denis H; Keeling, Patrick J

    2014-11-17

    One of the greatest hindrances to a comprehensive understanding of microbial genomics, cell biology, ecology, and evolution is that most microbial life is not in culture. Solutions to this problem have mainly focused on whole-community surveys like metagenomics, but these analyses inevitably loose information and present particular challenges for eukaryotes, which are relatively rare and possess large, gene-sparse genomes. Single-cell analyses present an alternative solution that allows for specific species to be targeted, while retaining information on cellular identity, morphology, and partitioning of activities within microbial communities. Single-cell transcriptomics, pioneered in medical research, offers particular potential advantages for uncultivated eukaryotes, but the efficiency and biases have not been tested. Here we describe a simple and reproducible method for single-cell transcriptomics using manually isolated cells from five model ciliate species; we examine impacts of amplification bias and contamination, and compare the efficacy of gene discovery to traditional culture-based transcriptomics. Gene discovery using single-cell transcriptomes was found to be comparable to mass-culture methods, suggesting single-cell transcriptomics is an efficient entry point into genomic data from the vast majority of eukaryotic biodiversity. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Transcriptome sequencing and whole genome expression profiling of chrysanthemum under dehydration stress

    PubMed Central

    2013-01-01

    Background Chrysanthemum is one of the most important ornamental crops in the world and drought stress seriously limits its production and distribution. In order to generate a functional genomics resource and obtain a deeper understanding of the molecular mechanisms regarding chrysanthemum responses to dehydration stress, we performed large-scale transcriptome sequencing of chrysanthemum plants under dehydration stress using the Illumina sequencing technology. Results Two cDNA libraries constructed from mRNAs of control and dehydration-treated seedlings were sequenced by Illumina technology. A total of more than 100 million reads were generated and de novo assembled into 98,180 unique transcripts which were further extensively annotated by comparing their sequencing to different protein databases. Biochemical pathways were predicted from these transcript sequences. Furthermore, we performed gene expression profiling analysis upon dehydration treatment in chrysanthemum and identified 8,558 dehydration-responsive unique transcripts, including 307 transcription factors and 229 protein kinases and many well-known stress responsive genes. Gene ontology (GO) term enrichment and biochemical pathway analyses showed that dehydration stress caused changes in hormone response, secondary and amino acid metabolism, and light and photoperiod response. These findings suggest that drought tolerance of chrysanthemum plants may be related to the regulation of hormone biosynthesis and signaling, reduction of oxidative damage, stabilization of cell proteins and structures, and maintenance of energy and carbon supply. Conclusions Our transcriptome sequences can provide a valuable resource for chrysanthemum breeding and research and novel insights into chrysanthemum responses to dehydration stress and offer candidate genes or markers that can be used to guide future studies attempting to breed drought tolerant chrysanthemum cultivars. PMID:24074255

  5. Genome and transcriptome sequencing in prospective metastatic triple-negative breast cancer uncovers therapeutic vulnerabilities.

    PubMed

    Craig, David W; O'Shaughnessy, Joyce A; Kiefer, Jeffrey A; Aldrich, Jessica; Sinari, Shripad; Moses, Tracy M; Wong, Shukmei; Dinh, Jennifer; Christoforides, Alexis; Blum, Joanne L; Aitelli, Cristi L; Osborne, Cynthia R; Izatt, Tyler; Kurdoglu, Ahmet; Baker, Angela; Koeman, Julie; Barbacioru, Catalin; Sakarya, Onur; De La Vega, Francisco M; Siddiqui, Asim; Hoang, Linh; Billings, Paul R; Salhia, Bodour; Tolcher, Anthony W; Trent, Jeffrey M; Mousses, Spyro; Von Hoff, Daniel; Carpten, John D

    2013-01-01

    Triple-negative breast cancer (TNBC) is characterized by the absence of expression of estrogen receptor, progesterone receptor, and HER-2. Thirty percent of patients recur after first-line treatment, and metastatic TNBC (mTNBC) has a poor prognosis with median survival of one year. Here, we present initial analyses of whole genome and transcriptome sequencing data from 14 prospective mTNBC. We have cataloged the collection of somatic genomic alterations in these advanced tumors, particularly those that may inform targeted therapies. Genes mutated in multiple tumors included TP53, LRP1B, HERC1, CDH5, RB1, and NF1. Notable genes involved in focal structural events were CTNNA1, PTEN, FBXW7, BRCA2, WT1, FGFR1, KRAS, HRAS, ARAF, BRAF, and PGCP. Homozygous deletion of CTNNA1 was detected in 2 of 6 African Americans. RNA sequencing revealed consistent overexpression of the FOXM1 gene when tumor gene expression was compared with nonmalignant breast samples. Using an outlier analysis of gene expression comparing one cancer with all the others, we detected expression patterns unique to each patient's tumor. Integrative DNA/RNA analysis provided evidence for deregulation of mutated genes, including the monoallelic expression of TP53 mutations. Finally, molecular alterations in several cancers supported targeted therapeutic intervention on clinical trials with known inhibitors, particularly for alterations in the RAS/RAF/MEK/ERK and PI3K/AKT/mTOR pathways. In conclusion, whole genome and transcriptome profiling of mTNBC have provided insights into somatic events occurring in this difficult to treat cancer. These genomic data have guided patients to investigational treatment trials and provide hypotheses for future trials in this irremediable cancer.

  6. Integrated whole-genome and transcriptome sequence analysis reveals the genetic characteristics of a riboflavin-overproducing Bacillus subtilis.

    PubMed

    Wang, Guanglu; Shi, Ting; Chen, Tao; Wang, Xiaoyue; Wang, Yongcheng; Liu, Dingyu; Guo, Jiaxin; Fu, Jing; Feng, Lili; Wang, Zhiwen; Zhao, Xueming

    2018-06-02

    Commercial riboflavin production with Bacillus subtilis has been developed by combining rational and classical strain development for almost two decades, but how an improved riboflavin producer can be created rationally is still not completely understood. In this study, we demonstrate the combined use of integrated genomic and transcriptomic analysis of the genetic basis for riboflavin over-production in B. subtilis. This methodology succeeded in discerning the positive mutations in the mutagenesis derived riboflavin producer B. subtilis 24/pMX45 through whole-genome sequencing and transcriptome sequencing. These included RibC (G199D), ribD + (G+39A), PurA (P242L), CcpN(A44S), YvrH (R222Q) and two nonsense mutations YhcF (R90*) and YwaA (Q68*). Reintroducing these specific mutations into the wild-type strain recovered the riboflavin overproduction phenotype and subsequent metabolic engineering greatly improved riboflavin production, achieving an up to 3.4-fold increase of the riboflavin titer over the sequenced producer. A novel mutation, YvrH (R222Q), involved in a typical two-component regulatory system deregulated the purine de novo synthesis pathway and increased the pool of intracellular purine metabolites, which in turn increased riboflavin production. Taken together, we present a case study of combining genome and transcriptome analysis to elucidate the genetic underpinnings of a complex cellular property, which enabled the transfer of beneficial mutations to engineer a reference strain into an overproducer. Copyright © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.

  7. Elucidating and mining the Tulipa and Lilium transcriptomes.

    PubMed

    Moreno-Pachon, Natalia M; Leeggangers, Hendrika A C F; Nijveen, Harm; Severing, Edouard; Hilhorst, Henk; Immink, Richard G H

    2016-10-01

    Genome sequencing remains a challenge for species with large and complex genomes containing extensive repetitive sequences, of which the bulbous and monocotyledonous plants tulip and lily are examples. In such a case, sequencing of only the active part of the genome, represented by the transcriptome, is a good alternative to obtain information about gene content. In this study we aimed to generate a high quality transcriptome of tulip and lily and to make this data available as an open-access resource via a user-friendly web-based interface. The Illumina HiSeq 2000 platform was applied and the transcribed RNA was sequenced from a collection of different lily and tulip tissues, respectively. In order to obtain good transcriptome coverage and to facilitate effective data mining, assembly was done using different filtering parameters for clearing out contamination and noise of the RNAseq datasets. This analysis revealed limitations of commonly applied methods and parameter settings used in de novo transcriptome assembly. The final created transcriptomes are publicly available via a user friendly Transcriptome browser ( http://www.bioinformatics.nl/bulbs/db/species/index ). The usefulness of this resource has been exemplified by a search for all potential transcription factors in lily and tulip, with special focus on the TCP transcription factor family. This analysis and other quality parameters point out the quality of the transcriptomes, which can serve as a basis for further genomics studies in lily, tulip, and bulbous plants in general.

  8. Transcriptome analysis of root response to citrus blight based on the newly assembled Swingle citrumelo draft genome.

    PubMed

    Zhang, Yunzeng; Barthe, Gary; Grosser, Jude W; Wang, Nian

    2016-07-08

    Citrus blight is a citrus tree overall decline disease and causes serious losses in the citrus industry worldwide. Although it was described more than one hundred years ago, its causal agent remains unknown and its pathophysiology is not well determined, which hampers our understanding of the disease and design of suitable disease management. In this study, we sequenced and assembled the draft genome for Swingle citrumelo, one important citrus rootstock. The draft genome is approximately 280 Mb, which covers 74 % of the estimated Swingle citrumelo genome and the average coverage is around 15X. The draft genome of Swingle citrumelo enabled us to conduct transcriptome analysis of roots of blight and healthy Swingle citrumelo using RNA-seq. The RNA-seq was reliable as evidenced by the high consistence of RNA-seq analysis and quantitative reverse transcription PCR results (R(2) = 0.966). Comparison of the gene expression profiles between blight and healthy root samples revealed the molecular mechanism underneath the characteristic blight phenotypes including decline, starch accumulation, and drought stress. The JA and ET biosynthesis and signaling pathways showed decreased transcript abundance, whereas SA-mediated defense-related genes showed increased transcript abundance in blight trees, suggesting unclassified biotrophic pathogen was involved in this disease. Overall, the Swingle citrumelo draft genome generated in this study will advance our understanding of plant biology and contribute to the citrus breeding. Transcriptome analysis of blight and healthy trees deepened our understanding of the pathophysiology of citrus blight.

  9. Integrating transcriptome and genome re-sequencing data to identify key genes and mutations affecting chicken eggshell qualities.

    PubMed

    Zhang, Quan; Zhu, Feng; Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as revealed by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus.

  10. Improved annotation with de novo transcriptome assembly in four social amoeba species.

    PubMed

    Singh, Reema; Lawal, Hajara M; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J; Schaap, Pauline; Cole, Christian

    2017-01-31

    Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.

  11. Genome and Transcriptome Sequences Reveal the Specific Parasitism of the Nematophagous Purpureocillium lilacinum 36-1

    PubMed Central

    Xie, Jialian; Li, Shaojun; Mo, Chenmi; Xiao, Xueqiong; Peng, Deliang; Wang, Gaofeng; Xiao, Yannong

    2016-01-01

    Purpureocillium lilacinum is a promising nematophagous ascomycete able to adapt diverse environments and it is also an opportunistic fungus that infects humans. A microbial inoculant of P. lilacinum has been registered to control plant parasitic nematodes. However, the molecular mechanism of the toxicological processes is still unclear because of the relatively few reports on the subject. In this study, using Illumina paired-end sequencing, the draft genome sequence and the transcriptome of P. lilacinum strain 36-1 infecting nematode-eggs were determined. Whole genome alignment indicated that P. lilacinum 36-1 possessed a more dynamic genome in comparison with P. lilacinum India strain. Moreover, a phylogenetic analysis showed that the P. lilacinum 36-1 had a closer relation to entomophagous fungi. The protein-coding genes in P. lilacinum 36-1 occurred much more frequently than they did in other fungi, which was a result of the depletion of repeat-induced point mutations (RIP). Comparative genome and transcriptome analyses revealed the genes that were involved in pathogenicity, particularly in the recognition, adhesion of nematode-eggs, downstream signal transduction pathways and hydrolase genes. By contrast, certain numbers of cellulose and xylan degradation genes and a lack of polysaccharide lyase genes showed the potential of P. lilacinum 36-1 as an endophyte. Notably, the expression of appressorium-formation and antioxidants-related genes exhibited similar infection patterns in P. lilacinum strain 36-1 to those of the model entomophagous fungi Metarhizium spp. These results uncovered the specific parasitism of P. lilacinum and presented the genes responsible for the infection of nematode-eggs. PMID:27486440

  12. Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea ( Vigna unguiculata (L.) Walp)

    PubMed Central

    Spriggs, Andrew; Henderson, Steven T.; Hand, Melanie L.; Johnson, Susan D.; Taylor, Jennifer M.; Koltunow, Anna

    2018-01-01

    Cowpea ( Vigna unguiculata (L.) Walp) is an important legume crop for food security in areas of low-input and smallholder farming throughout Africa and Asia. Genetic improvements are required to increase yield and resilience to biotic and abiotic stress and to enhance cowpea crop performance. An integrated cowpea genomic and gene expression data resource has the potential to greatly accelerate breeding and the delivery of novel genetic traits for cowpea. Extensive genomic resources for cowpea have been absent from the public domain; however, a recent early release reference genome for IT97K-499-35 ( Vigna unguiculata v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/) has now been established in a collaboration between the Joint Genome Institute (JGI) and University California (UC) Riverside. Here we release supporting genomic and transcriptomic data for IT97K-499-35 and a second transformable cowpea variety, IT86D-1010. The transcriptome resource includes six tissue-specific datasets for each variety, with particular emphasis on reproductive tissues that extend and support the V. unguiculata v1.0 reference. Annotations have been included in our resource to allow direct mapping to the v1.0 cowpea reference. Access to this resource provided here is supported by raw and assembled data downloads. PMID:29528046

  13. Assembled genomic and tissue-specific transcriptomic data resources for two genetically distinct lines of Cowpea ( Vigna unguiculata (L.) Walp).

    PubMed

    Spriggs, Andrew; Henderson, Steven T; Hand, Melanie L; Johnson, Susan D; Taylor, Jennifer M; Koltunow, Anna

    2018-02-09

    Cowpea ( Vigna unguiculata (L.) Walp) is an important legume crop for food security in areas of low-input and smallholder farming throughout Africa and Asia. Genetic improvements are required to increase yield and resilience to biotic and abiotic stress and to enhance cowpea crop performance. An integrated cowpea genomic and gene expression data resource has the potential to greatly accelerate breeding and the delivery of novel genetic traits for cowpea. Extensive genomic resources for cowpea have been absent from the public domain; however, a recent early release reference genome for IT97K-499-35 ( Vigna unguiculata  v1.0, NSF, UCR, USAID, DOE-JGI, http://phytozome.jgi.doe.gov/) has now been established in a collaboration between the Joint Genome Institute (JGI) and University California (UC) Riverside. Here we release supporting genomic and transcriptomic data for IT97K-499-35 and a second transformable cowpea variety, IT86D-1010. The transcriptome resource includes six tissue-specific datasets for each variety, with particular emphasis on reproductive tissues that extend and support the V. unguiculata v1.0 reference. Annotations have been included in our resource to allow direct mapping to the v1.0 cowpea reference. Access to this resource provided here is supported by raw and assembled data downloads.

  14. Comparative description of ten transcriptomes of newly sequenced invertebrates and efficiency estimation of genomic sampling in non-model taxa

    PubMed Central

    2012-01-01

    Introduction Traditionally, genomic or transcriptomic data have been restricted to a few model or emerging model organisms, and to a handful of species of medical and/or environmental importance. Next-generation sequencing techniques have the capability of yielding massive amounts of gene sequence data for virtually any species at a modest cost. Here we provide a comparative analysis of de novo assembled transcriptomic data for ten non-model species of previously understudied animal taxa. Results cDNA libraries of ten species belonging to five animal phyla (2 Annelida [including Sipuncula], 2 Arthropoda, 2 Mollusca, 2 Nemertea, and 2 Porifera) were sequenced in different batches with an Illumina Genome Analyzer II (read length 100 or 150 bp), rendering between ca. 25 and 52 million reads per species. Read thinning, trimming, and de novo assembly were performed under different parameters to optimize output. Between 67,423 and 207,559 contigs were obtained across the ten species, post-optimization. Of those, 9,069 to 25,681 contigs retrieved blast hits against the NCBI non-redundant database, and approximately 50% of these were assigned with Gene Ontology terms, covering all major categories, and with similar percentages in all species. Local blasts against our datasets, using selected genes from major signaling pathways and housekeeping genes, revealed high efficiency in gene recovery compared to available genomes of closely related species. Intriguingly, our transcriptomic datasets detected multiple paralogues in all phyla and in nearly all gene pathways, including housekeeping genes that are traditionally used in phylogenetic applications for their purported single-copy nature. Conclusions We generated the first study of comparative transcriptomics across multiple animal phyla (comparing two species per phylum in most cases), established the first Illumina-based transcriptomic datasets for sponge, nemertean, and sipunculan species, and generated a tractable

  15. The First Chameleon Transcriptome: Comparative Genomic Analysis of the OXPHOS System Reveals Loss of COX8 in Iguanian Lizards

    PubMed Central

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system. PMID:24009133

  16. The first Chameleon transcriptome: comparative genomic analysis of the OXPHOS system reveals loss of COX8 in Iguanian lizards.

    PubMed

    Bar-Yaacov, Dan; Bouskila, Amos; Mishmar, Dan

    2013-01-01

    Recently, we found dramatic mitochondrial DNA divergence of Israeli Chamaeleo chamaeleon populations into two geographically distinct groups. We aimed to examine whether the same pattern of divergence could be found in nuclear genes. However, no genomic resource is available for any chameleon species. Here we present the first chameleon transcriptome, obtained using deep sequencing (SOLiD). Our analysis identified 164,000 sequence contigs of which 19,000 yielded unique BlastX hits. To test the efficacy of our sequencing effort, we examined whether the chameleon and other available reptilian transcriptomes harbored complete sets of genes comprising known biochemical pathways, focusing on the nDNA-encoded oxidative phosphorylation (OXPHOS) genes as a model. As a reference for the screen, we used the human 86 (including isoforms) known structural nDNA-encoded OXPHOS subunits. Analysis of 34 publicly available vertebrate transcriptomes revealed orthologs for most human OXPHOS genes. However, OXPHOS subunit COX8 (Cytochrome C oxidase subunit 8), including all its known isoforms, was consistently absent in transcriptomes of iguanian lizards, implying loss of this subunit during the radiation of this suborder. The lack of COX8 in the suborder Iguania is intriguing, since it is important for cellular respiration and ATP production. Our sequencing effort added a new resource for comparative genomic studies, and shed new light on the evolutionary dynamics of the OXPHOS system.

  17. Dictyocaulus viviparus genome, variome and transcriptome elucidate lungworm biology and support future intervention

    PubMed Central

    McNulty, Samantha N.; Strübe, Christina; Rosa, Bruce A.; Martin, John C.; Tyagi, Rahul; Choi, Young-Jun; Wang, Qi; Hallsworth Pepin, Kymberlie; Zhang, Xu; Ozersky, Philip; Wilson, Richard K.; Sternberg, Paul W.; Gasser, Robin B.; Mitreva, Makedonka

    2016-01-01

    The bovine lungworm, Dictyocaulus viviparus (order Strongylida), is an important parasite of livestock that causes substantial economic and production losses worldwide. Here we report the draft genome, variome, and developmental transcriptome of D. viviparus. The genome (161 Mb) is smaller than those of related bursate nematodes and encodes fewer proteins (14,171 total). In the first genome-wide assessment of genomic variation in any parasitic nematode, we found a high degree of sequence variability in proteins predicted to be involved host-parasite interactions. Next, we used extensive RNA sequence data to track gene transcription across the life cycle of D. viviparus, and identified genes that might be important in nematode development and parasitism. Finally, we predicted genes that could be vital in host-parasite interactions, genes that could serve as drug targets, and putative RNAi effectors with a view to developing functional genomic tools. This extensive, well-curated dataset should provide a basis for developing new anthelmintics, vaccines, and improved diagnostic tests and serve as a platform for future investigations of drug resistance and epidemiology of the bovine lungworm and related nematodes. PMID:26856411

  18. Genomic and transcriptomic approaches to study immunology in cyprinids: What is next?

    PubMed

    Petit, Jules; David, Lior; Dirks, Ron; Wiegertjes, Geert F

    2017-10-01

    Accelerated by the introduction of Next-Generation Sequencing (NGS), a number of genomes of cyprinid fish species have been drafted, leading to a highly valuable collective resource of comparative genome information on cyprinids (Cyprinidae). In addition, NGS-based transcriptome analyses of different developmental stages, organs, or cell types, increasingly contribute to the understanding of complex physiological processes, including immune responses. Cyprinids are a highly interesting family because they comprise one of the most-diversified families of teleosts and because of their variation in ploidy level, with diploid, triploid, tetraploid, hexaploid and sometimes even octoploid species. The wealth of data obtained from NGS technologies provides both challenges and opportunities for immunological research, which will be discussed here. Correct interpretation of ploidy effects on immune responses requires knowledge of the degree of functional divergence between duplicated genes, which can differ even between closely-related cyprinid fish species. We summarize NGS-based progress in analysing immune responses and discuss the importance of respecting the presence of (multiple) duplicated gene sequences when performing transcriptome analyses for detailed understanding of complex physiological processes. Progressively, advances in NGS technology are providing workable methods to further elucidate the implications of gene duplication events and functional divergence of duplicates genes and proteins involved in immune responses in cyprinids. We conclude with discussing how future applications of NGS technologies and analysis methods could enhance immunological research and understanding. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.

  19. Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize

    USDA-ARS?s Scientific Manuscript database

    Intense artificial selection over the last 100 years has produced elite maize (Zea mays) inbred lines that combine to produce high-yielding hybrids. To further our understanding of how genome and transcriptome variation contribute to the production of high-yielding hybrids, we generated a draft geno...

  20. De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis)

    PubMed Central

    Lok, Si; Paton, Tara A.; Wang, Zhuozhi; Kaur, Gaganjot; Walker, Susan; Yuen, Ryan K. C.; Sung, Wilson W. L.; Whitney, Joseph; Buchanan, Janet A.; Trost, Brett; Singh, Naina; Apresto, Beverly; Chen, Nan; Coole, Matthew; Dawson, Travis J.; Ho, Karen; Hu, Zhizhou; Pullenayegum, Sanjeev; Samler, Kozue; Shipstone, Arun; Tsoi, Fiona; Wang, Ting; Pereira, Sergio L.; Rostami, Pirooz; Ryan, Carol Ann; Tong, Amy Hin Yan; Ng, Karen; Sundaravadanam, Yogi; Simpson, Jared T.; Lim, Burton K.; Engstrom, Mark D.; Dutton, Christopher J.; Kerr, Kevin C. R.; Franke, Maria; Rapley, William; Wintle, Richard F.; Scherer, Stephen W.

    2017-01-01

    The Canadian beaver (Castor canadensis) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon–gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. PMID:28087693

  1. De Novo Transcriptome of the Hemimetabolous German Cockroach (Blattella germanica)

    PubMed Central

    Zhou, Xiaojie; Qian, Kun; Tong, Ying; Zhu, Junwei Jerry; Qiu, Xinghui; Zeng, Xiaopeng

    2014-01-01

    Background The German cockroach, Blattella germanica, is an important insect pest that transmits various pathogens mechanically and causes severe allergic diseases. This insect has long served as a model system for studies of insect biology, physiology and ecology. However, the lack of genome or transcriptome information heavily hinder our further understanding about the German cockroach in every aspect at a molecular level and on a genome-wide scale. To explore the transcriptome and identify unique sequences of interest, we subjected the B. germanica transcriptome to massively parallel pyrosequencing and generated the first reference transcriptome for B. germanica. Methodology/Principal Findings A total of 1,365,609 raw reads with an average length of 529 bp were generated via pyrosequencing the mixed cDNA library from different life stages of German cockroach including maturing oothecae, nymphs, adult females and males. The raw reads were de novo assembled to 48,800 contigs and 3,961 singletons with high-quality unique sequences. These sequences were annotated and classified functionally in terms of BLAST, GO and KEGG, and the genes putatively coding detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, immunity and chemoreception pathways were identified. A total of 3,601 SSRs (Simple Sequence Repeats) loci were also predicted. Conclusions/Significance The whole transcriptome pyrosequencing data from this study provides a usable genetic resource for future identification of potential functional genes involved in various biological processes. PMID:25265537

  2. A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

    PubMed

    Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

    2017-09-20

    Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.

  3. Genomic and transcriptomic alterations following intergeneric hybridization and polyploidization in the Chrysanthemum nankingense×Tanacetum vulgare hybrid and allopolyploid (Asteraceae).

    PubMed

    Qi, Xiangyu; Wang, Haibin; Song, Aiping; Jiang, Jiafu; Chen, Sumei; Chen, Fadi

    2018-01-01

    Allopolyploid formation involves two major events: interspecific hybridization and polyploidization. A number of species in the Asteraceae family are polyploids because of frequent hybridization. The effects of hybridization on genomics and transcriptomics in Chrysanthemum nankingense×Tanacetum vulgare hybrids have been reported. In this study, we obtained allopolyploids by applying a colchicine treatment to a synthesized C. nankingense × T. vulgare hybrid. Sequence-related amplified polymorphism (SRAP), methylation-sensitive amplification polymorphism (MSAP), and high-throughput RNA sequencing (RNA-Seq) technologies were used to investigate the genomic, epigenetic, and transcriptomic alterations in both the hybrid and allopolyploids. The genomic alterations in the hybrid and allopolyploids mainly involved the loss of parental fragments and the gain of novel fragments. The DNA methylation level of the hybrid was reduced by hybridization but was restored somewhat after polyploidization. There were more significant differences in gene expression between the hybrid/allopolyploid and the paternal parent than between the hybrid/allopolyploid and the maternal parent. Most differentially expressed genes (DEGs) showed down-regulation in the hybrid/allopolyploid relative to the parents. Among the non-additive genes, transgressive patterns appeared to be dominant, especially repression patterns. Maternal expression dominance was observed specifically for down-regulated genes. Many methylase and methyltransferase genes showed differential expression between the hybrid and parents and between the allopolyploid and parents. Our data indicate that hybridization may be a major factor affecting genomic and transcriptomic changes in newly formed allopolyploids. The formation of allopolyploids may not simply be the sum of hybridization and polyploidization changes but also may be influenced by the interaction between these processes.

  4. Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis.

    PubMed

    Davidson, Nadia M; Oshlack, Alicia

    2018-05-01

    RNA sequencing (RNA-seq) analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating an assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts, which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data are mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods.

  5. Genome and Transcriptome Analyses Provide Insight into the Euryhaline Adaptation Mechanism of Crassostrea gigas

    PubMed Central

    Zhang, Linlin; Li, Chunyan; Li, Li; She, Zhicai; Huang, Baoyu; Zhang, Guofan

    2013-01-01

    the most important effectors for oyster euryhaline adaptation. This study was the first to explain oyster euryhaline adaptation at a genome-wide scale in C. gigas. PMID:23554902

  6. Integrating Transcriptome and Genome Re-Sequencing Data to Identify Key Genes and Mutations Affecting Chicken Eggshell Qualities

    PubMed Central

    Liu, Long; Zheng, Chuan Wei; Wang, De He; Hou, Zhuo Cheng; Ning, Zhong Hua

    2015-01-01

    Eggshell damages lead to economic losses in the egg production industry and are a threat to human health. We examined 49-wk-old Rhode Island White hens (Gallus gallus) that laid eggs having shells with significantly different strengths and thicknesses. We used HiSeq 2000 (Illumina) sequencing to characterize the chicken transcriptome and whole genome to identify the key genes and genetic mutations associated with eggshell calcification. We identified a total of 14,234 genes expressed in the chicken uterus, representing 89% of all annotated chicken genes. A total of 889 differentially expressed genes were identified by comparing low eggshell strength (LES) and normal eggshell strength (NES) genomes. The DEGs are enriched in calcification-related processes, including calcium ion transport and calcium signaling pathways as reveled by gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis. Some important matrix proteins, such as OC-116, LTF and SPP1, were also expressed differentially between two groups. A total of 3,671,919 single-nucleotide polymorphisms (SNPs) and 508,035 Indels were detected in protein coding genes by whole-genome re-sequencing, including 1775 non-synonymous variations and 19 frame-shift Indels in DEGs. SNPs and Indels found in this study could be further investigated for eggshell traits. This is the first report to integrate the transcriptome and genome re-sequencing to target the genetic variations which decreased the eggshell qualities. These findings further advance our understanding of eggshell calcification in the chicken uterus. PMID:25974068

  7. Genomics, transcriptomics and proteomics: enabling insights into social evolution and disease challenges for managed and wild bees.

    PubMed

    Trapp, Judith; McAfee, Alison; Foster, Leonard J

    2017-02-01

    Globally, there are over 20 000 bee species (Hymenoptera: Apoidea: Anthophila) with a host of biologically fascinating characteristics. Although they have long been studied as models for social evolution, recent challenges to bee health (mainly diseases and pesticides) have gathered the attention of both public and research communities. Genome sequences of twelve bee species are now complete or under progress, facilitating the application of additional 'omic technologies. Here, we review recent developments in honey bee and native bee research in the genomic era. We discuss the progress in genome sequencing and functional annotation, followed by the enabled comparative genomics, proteomics and transcriptomics applications regarding social evolution and health. Finally, we end with comments on future challenges in the postgenomic era. © 2016 John Wiley & Sons Ltd.

  8. The Genomic HyperBrowser: an analysis web server for genome-scale data

    PubMed Central

    Sandve, Geir K.; Gundersen, Sveinung; Johansen, Morten; Glad, Ingrid K.; Gunathasan, Krishanthi; Holden, Lars; Holden, Marit; Liestøl, Knut; Nygård, Ståle; Nygaard, Vegard; Paulsen, Jonas; Rydbeck, Halfdan; Trengereid, Kai; Clancy, Trevor; Drabløs, Finn; Ferkingstad, Egil; Kalaš, Matúš; Lien, Tonje; Rye, Morten B.; Frigessi, Arnoldo; Hovig, Eivind

    2013-01-01

    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome. PMID:23632163

  9. The Genomic HyperBrowser: an analysis web server for genome-scale data.

    PubMed

    Sandve, Geir K; Gundersen, Sveinung; Johansen, Morten; Glad, Ingrid K; Gunathasan, Krishanthi; Holden, Lars; Holden, Marit; Liestøl, Knut; Nygård, Ståle; Nygaard, Vegard; Paulsen, Jonas; Rydbeck, Halfdan; Trengereid, Kai; Clancy, Trevor; Drabløs, Finn; Ferkingstad, Egil; Kalas, Matús; Lien, Tonje; Rye, Morten B; Frigessi, Arnoldo; Hovig, Eivind

    2013-07-01

    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.

  10. De Novo Genome and Transcriptome Assembly of the Canadian Beaver (Castor canadensis).

    PubMed

    Lok, Si; Paton, Tara A; Wang, Zhuozhi; Kaur, Gaganjot; Walker, Susan; Yuen, Ryan K C; Sung, Wilson W L; Whitney, Joseph; Buchanan, Janet A; Trost, Brett; Singh, Naina; Apresto, Beverly; Chen, Nan; Coole, Matthew; Dawson, Travis J; Ho, Karen; Hu, Zhizhou; Pullenayegum, Sanjeev; Samler, Kozue; Shipstone, Arun; Tsoi, Fiona; Wang, Ting; Pereira, Sergio L; Rostami, Pirooz; Ryan, Carol Ann; Tong, Amy Hin Yan; Ng, Karen; Sundaravadanam, Yogi; Simpson, Jared T; Lim, Burton K; Engstrom, Mark D; Dutton, Christopher J; Kerr, Kevin C R; Franke, Maria; Rapley, William; Wintle, Richard F; Scherer, Stephen W

    2017-02-09

    The Canadian beaver ( Castor canadensis ) is the largest indigenous rodent in North America. We report a draft annotated assembly of the beaver genome, the first for a large rodent and the first mammalian genome assembled directly from uncorrected and moderate coverage (< 30 ×) long reads generated by single-molecule sequencing. The genome size is 2.7 Gb estimated by k-mer analysis. We assembled the beaver genome using the new Canu assembler optimized for noisy reads. The resulting assembly was refined using Pilon supported by short reads (80 ×) and checked for accuracy by congruency against an independent short read assembly. We scaffolded the assembly using the exon-gene models derived from 9805 full-length open reading frames (FL-ORFs) constructed from the beaver leukocyte and muscle transcriptomes. The final assembly comprised 22,515 contigs with an N50 of 278,680 bp and an N50-scaffold of 317,558 bp. Maximum contig and scaffold lengths were 3.3 and 4.2 Mb, respectively, with a combined scaffold length representing 92% of the estimated genome size. The completeness and accuracy of the scaffold assembly was demonstrated by the precise exon placement for 91.1% of the 9805 assembled FL-ORFs and 83.1% of the BUSCO (Benchmarking Universal Single-Copy Orthologs) gene set used to assess the quality of genome assemblies. Well-represented were genes involved in dentition and enamel deposition, defining characteristics of rodents with which the beaver is well-endowed. The study provides insights for genome assembly and an important genomics resource for Castoridae and rodent evolutionary biology. Copyright © 2017 Lok et al.

  11. Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network

    PubMed Central

    Martín-Jiménez, Cynthia A.; Salazar-Barreto, Diego; Barreto, George E.; González, Janneth

    2017-01-01

    Astrocytes are the most abundant cells of the central nervous system; they have a predominant role in maintaining brain metabolism. In this sense, abnormal metabolic states have been found in different neuropathological diseases. Determination of metabolic states of astrocytes is difficult to model using current experimental approaches given the high number of reactions and metabolites present. Thus, genome-scale metabolic networks derived from transcriptomic data can be used as a framework to elucidate how astrocytes modulate human brain metabolic states during normal conditions and in neurodegenerative diseases. We performed a Genome-Scale Reconstruction of the Human Astrocyte Metabolic Network with the purpose of elucidating a significant portion of the metabolic map of the astrocyte. This is the first global high-quality, manually curated metabolic reconstruction network of a human astrocyte. It includes 5,007 metabolites and 5,659 reactions distributed among 8 cell compartments, (extracellular, cytoplasm, mitochondria, endoplasmic reticle, Golgi apparatus, lysosome, peroxisome and nucleus). Using the reconstructed network, the metabolic capabilities of human astrocytes were calculated and compared both in normal and ischemic conditions. We identified reactions activated in these two states, which can be useful for understanding the astrocytic pathways that are affected during brain disease. Additionally, we also showed that the obtained flux distributions in the model, are in accordance with literature-based findings. Up to date, this is the most complete representation of the human astrocyte in terms of inclusion of genes, proteins, reactions and metabolic pathways, being a useful guide for in-silico analysis of several metabolic behaviors of the astrocyte during normal and pathologic states. PMID:28243200

  12. Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences

    PubMed Central

    2013-01-01

    Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002

  13. Breaking the 1000-gene barrier for Mimivirus using ultra-deep genome and transcriptome sequencing.

    PubMed

    Legendre, Matthieu; Santini, Sébastien; Rico, Alain; Abergel, Chantal; Claverie, Jean-Michel

    2011-03-04

    Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs). Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads), and a complete genome re-sequencing (45.3 Million reads). This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.

  14. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    PubMed

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition

  15. Transcriptome profiling reveals mosaic genomic origins of modern cultivated barley.

    PubMed

    Dai, Fei; Chen, Zhong-Hua; Wang, Xiaolei; Li, Zefeng; Jin, Gulei; Wu, Dezhi; Cai, Shengguan; Wang, Ning; Wu, Feibo; Nevo, Eviatar; Zhang, Guoping

    2014-09-16

    The domestication of cultivated barley has been used as a model system for studying the origins and early spread of agrarian culture. Our previous results indicated that the Tibetan Plateau and its vicinity is one of the centers of domestication of cultivated barley. Here we reveal multiple origins of domesticated barley using transcriptome profiling of cultivated and wild-barley genotypes. Approximately 48-Gb of clean transcript sequences in 12 Hordeum spontaneum and 9 Hordeum vulgare accessions were generated. We reported 12,530 de novo assembled transcripts in all of the 21 samples. Population structure analysis showed that Tibetan hulless barley (qingke) might have existed in the early stage of domestication. Based on the large number of unique genomic regions showing the similarity between cultivated and wild-barley groups, we propose that the genomic origin of modern cultivated barley is derived from wild-barley genotypes in the Fertile Crescent (mainly in chromosomes 1H, 2H, and 3H) and Tibet (mainly in chromosomes 4H, 5H, 6H, and 7H). This study indicates that the domestication of barley may have occurred over time in geographically distinct regions.

  16. Extensive Transcriptomic and Genomic Analysis Provides New Insights about Luminal Breast Cancers

    PubMed Central

    Tishchenko, Inna; Milioli, Heloisa Helena; Riveros, Carlos; Moscato, Pablo

    2016-01-01

    Despite constituting approximately two thirds of all breast cancers, the luminal A and B tumours are poorly classified at both clinical and molecular levels. There are contradictory reports on the nature of these subtypes: some define them as intrinsic entities, others as a continuum. With the aim of addressing these uncertainties and identifying molecular signatures of patients at risk, we conducted a comprehensive transcriptomic and genomic analysis of 2,425 luminal breast cancer samples. Our results indicate that the separation between the molecular luminal A and B subtypes—per definition—is not associated with intrinsic characteristics evident in the differentiation between other subtypes. Moreover, t-SNE and MST-kNN clustering approaches based on 10,000 probes, associated with luminal tumour initiation and/or development, revealed the close connections between luminal A and B tumours, with no evidence of a clear boundary between them. Thus, we considered all luminal tumours as a single heterogeneous group for analysis purposes. We first stratified luminal tumours into two distinct groups by their HER2 gene cluster co-expression: HER2-amplified luminal and ordinary-luminal. The former group is associated with distinct transcriptomic and genomic profiles, and poor prognosis; it comprises approximately 8% of all luminal cases. For the remaining ordinary-luminal tumours we further identified the molecular signature correlated with disease outcomes, exhibiting an approximately continuous gene expression range from low to high risk. Thus, we employed four virtual quantiles to segregate the groups of patients. The clinico-pathological characteristics and ratios of genomic aberrations are concordant with the variations in gene expression profiles, hinting at a progressive staging. The comparison with the current separation into luminal A and B subtypes revealed a substantially improved survival stratification. Concluding, we suggest a review of the definition of

  17. Genomic and Transcriptomic Analyses of Indole-3-Acetic Acid Biosynthesis in Diatoms

    NASA Astrophysics Data System (ADS)

    Lim, R.; Armbrust, V.

    2016-02-01

    Indole-3-acetic acid (IAA) is a major plant growth hormone and a common mediator of plant-bacterial interactions. Recently, IAA has also been found to play a role in interactions between diatoms and bacteria, with IAA production by an associated Sulfitobacter leading to increased growth rates in the marine diatom Pseudo-nitzschia multiseries. It is unclear, however, if diatoms themselves are able to synthesize IAA and whether this capability is widespread throughout Bacillariophyta. Four major tryptophan-dependent IAA biosynthesis pathways have been identified in plants and bacteria, each denoted by the first intermediate downstream of tryptophan: the indole-3-pyruvate (IPyA), tryptamine (TAM), indole-3-acetaldoxime (IAOx) and indole-3-acetamide (IAM) pathways. To investigate the possibility of IAA biosynthesis in diatoms, we first analyzed publicly available genomes of raphid pennates P. multiseries, Phaeodactylum tricornutum, Fragilariopsis cylindrus and centric Thalassiosira pseudonana for potential homologs to plant and bacterial IAA biosynthesis genes. The P. multiseries, F. cylindrus and P. tricornutum genomes encode downstream enzymes for bacterial TAM and IAM and plant IPyA pathways. The more evolutionarily ancient T. pseudonana encodes one TAM enzyme in its genome. To investigate the potential distribution of these pathways more broadly, we surveyed the transcriptomes of 11 diatom species that include representatives from all four Bacillariophyta classes. Datasets used were sequenced as part of the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) and obtained from cultures maintained axenically. Transcripts associated with the TAM pathway were most frequently detected, with potential homologs to required enzymes identified in 10 of the 11 species examined. Transcripts homologous to rate-limiting IPyA enzymes were detected in six species. Only two centric and araphid pennate species expressed transcripts associated with enzymes in the

  18. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology

    PubMed Central

    Udy, Dylan B.; Voorhies, Mark; Chan, Patricia P.; Lowe, Todd M.; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes—and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics. PMID:26252667

  19. Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology.

    PubMed

    Udy, Dylan B; Voorhies, Mark; Chan, Patricia P; Lowe, Todd M; Dumont, Sophie

    2015-01-01

    The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes-and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics.

  20. A survey of the sorghum transcriptome using single-molecule long reads

    DOE PAGES

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

    2016-06-24

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less

  1. A survey of the sorghum transcriptome using single-molecule long reads

    PubMed Central

    Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

    2016-01-01

    Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290

  2. Genome-wide inference of regulatory networks in Streptomyces coelicolor.

    PubMed

    Castro-Melchor, Marlene; Charaniya, Salim; Karypis, George; Takano, Eriko; Hu, Wei-Shou

    2010-10-18

    The onset of antibiotics production in Streptomyces species is co-ordinated with differentiation events. An understanding of the genetic circuits that regulate these coupled biological phenomena is essential to discover and engineer the pharmacologically important natural products made by these species. The availability of genomic tools and access to a large warehouse of transcriptome data for the model organism, Streptomyces coelicolor, provides incentive to decipher the intricacies of the regulatory cascades and develop biologically meaningful hypotheses. In this study, more than 500 samples of genome-wide temporal transcriptome data, comprising wild-type and more than 25 regulatory gene mutants of Streptomyces coelicolor probed across multiple stress and medium conditions, were investigated. Information based on transcript and functional similarity was used to update a previously-predicted whole-genome operon map and further applied to predict transcriptional networks constituting modules enriched in diverse functions such as secondary metabolism, and sigma factor. The predicted network displays a scale-free architecture with a small-world property observed in many biological networks. The networks were further investigated to identify functionally-relevant modules that exhibit functional coherence and a consensus motif in the promoter elements indicative of DNA-binding elements. Despite the enormous experimental as well as computational challenges, a systems approach for integrating diverse genome-scale datasets to elucidate complex regulatory networks is beginning to emerge. We present an integrated analysis of transcriptome data and genomic features to refine a whole-genome operon map and to construct regulatory networks at the cistron level in Streptomyces coelicolor. The functionally-relevant modules identified in this study pose as potential targets for further studies and verification.

  3. Automated multiplex genome-scale engineering in yeast

    PubMed Central

    Si, Tong; Chao, Ran; Min, Yuhao; Wu, Yuying; Ren, Wen; Zhao, Huimin

    2017-01-01

    Genome-scale engineering is indispensable in understanding and engineering microorganisms, but the current tools are mainly limited to bacterial systems. Here we report an automated platform for multiplex genome-scale engineering in Saccharomyces cerevisiae, an important eukaryotic model and widely used microbial cell factory. Standardized genetic parts encoding overexpression and knockdown mutations of >90% yeast genes are created in a single step from a full-length cDNA library. With the aid of CRISPR-Cas, these genetic parts are iteratively integrated into the repetitive genomic sequences in a modular manner using robotic automation. This system allows functional mapping and multiplex optimization on a genome scale for diverse phenotypes including cellulase expression, isobutanol production, glycerol utilization and acetic acid tolerance, and may greatly accelerate future genome-scale engineering endeavours in yeast. PMID:28469255

  4. Global Genome and Transcriptome Analyses of Magnaporthe oryzae Epidemic Isolate 98-06 Uncover Novel Effectors and Pathogenicity-Related Genes, Revealing Gene Gain and Lose Dynamics in Genome Evolution

    PubMed Central

    Dong, Yanhan; Li, Ying; Zhao, Miaomiao; Jing, Maofeng; Liu, Xinyu; Liu, Muxing; Guo, Xianxian; Zhang, Xing; Chen, Yue; Liu, Yongfeng; Liu, Yanhong; Ye, Wenwu; Zhang, Haifeng; Wang, Yuanchao; Zheng, Xiaobo; Wang, Ping; Zhang, Zhengguang

    2015-01-01

    Genome dynamics of pathogenic organisms are driven by pathogen and host co-evolution, in which pathogen genomes are shaped to overcome stresses imposed by hosts with various genetic backgrounds through generation of a variety of isolates. This same principle applies to the rice blast pathogen Magnaporthe oryzae and the rice host; however, genetic variations among different isolates of M. oryzae remain largely unknown, particularly at genome and transcriptome levels. Here, we applied genomic and transcriptomic analytical tools to investigate M. oryzae isolate 98-06 that is the most aggressive in infection of susceptible rice cultivars. A unique 1.4 Mb of genomic sequences was found in isolate 98-06 in comparison to reference strain 70-15. Genome-wide expression profiling revealed the presence of two critical expression patterns of M. oryzae based on 64 known pathogenicity-related (PaR) genes. In addition, 134 candidate effectors with various segregation patterns were identified. Five tested proteins could suppress BAX-mediated programmed cell death in Nicotiana benthamiana leaves. Characterization of isolate-specific effector candidates Iug6 and Iug9 and PaR candidate Iug18 revealed that they have a role in fungal propagation and pathogenicity. Moreover, Iug6 and Iug9 are located exclusively in the biotrophic interfacial complex (BIC) and their overexpression leads to suppression of defense-related gene expression in rice, suggesting that they might participate in biotrophy by inhibiting the SA and ET pathways within the host. Thus, our studies identify novel effector and PaR proteins involved in pathogenicity of the highly aggressive M. oryzae field isolate 98-06, and reveal molecular and genomic dynamics in the evolution of M. oryzae and rice host interactions. PMID:25837042

  5. Effects of Space Environment on Genome, Transcriptome, and Proteome of Klebsiella pneumoniae.

    PubMed

    Guo, Yinghua; Li, Jia; Liu, Jinwen; Wang, Tong; Li, Yinhu; Yuan, Yanting; Zhao, Jiao; Chang, De; Fang, Xiangqun; Li, Tianzhi; Wang, Junfeng; Dai, Wenkui; Fang, Chengxiang; Liu, Changting

    2015-11-01

    The aim of this study was to explore the effects of space flight on Klebsiella pneumoniae. A strain of K. pneumoniae was sent to space for 398 h aboard the ShenZhou VIII spacecraft during November 1, 2011-November 17, 2011. At the same time, a ground simulation with similar temperature conditions during the space flight was performed as a control. After the space mission, the flight and control strains were analyzed using phenotypic, genomic, transcriptomic and proteomic techniques. The flight strains LCT-KP289 exhibited a higher cotrimoxazole resistance level and changes in metabolism relative to the ground control strain LCT-KP214. After the space flight, 73 SNPs and a plasmid copy number variation were identified in the flight strain. Based on the transcriptomic analysis, there are 232 upregulated and 1879 downregulated genes, of which almost all were for metabolism. Proteomic analysis revealed that there were 57 upregulated and 125 downregulated proteins. These differentially expressed proteins had several functions that included energy production and conversion, carbohydrate transport and metabolism, translation, ribosomal structure and biogenesis, posttranslational modification, protein turnover, and chaperone functions. At a systems biology level, the ytfG gene had a synonymous mutation that resulted in significantly downregulated expression at both transcriptomic and proteomic levels. The mutation of the ytfG gene may influence fructose and mannose metabolic processes of K. pneumoniae during space flight, which may be beneficial to the field of space microbiology, providing potential therapeutic strategies to combat or prevent infection in astronauts. Copyright © 2015 IMSS. Published by Elsevier Inc. All rights reserved.

  6. Transcriptome instability as a molecular pan-cancer characteristic of carcinomas.

    PubMed

    Sveen, Anita; Johannessen, Bjarne; Teixeira, Manuel R; Lothe, Ragnhild A; Skotheim, Rolf I

    2014-08-10

    We have previously proposed transcriptome instability as a genome-wide, pre-mRNA splicing-related characteristic of colorectal cancer. Here, we explore the hypothesis of transcriptome instability being a general characteristic of cancer. Exon-level microarray expression data from ten cancer datasets were analyzed, including breast cancer, cervical cancer, colorectal cancer, gastric cancer, lung cancer, neuroblastoma, and prostate cancer (555 samples), as well as paired normal tissue samples from the colon, lung, prostate, and stomach (93 samples). Based on alternative splicing scores across the genomes, we calculated sample-wise relative amounts of aberrant exon skipping and inclusion. Strong and non-random (P < 0.001) correlations between these estimates and the expression levels of splicing factor genes (n = 280) were found in most cancer types analyzed (breast-, cervical-, colorectal-, lung- and prostate cancer). This suggests a biological explanation for the splicing variation. Surprisingly, these associations prevailed in pan-cancer analyses. This is in contrast to the tissue and cancer specific patterns observed in comparisons across healthy tissue samples from the colon, lung, prostate, and stomach, and between paired cancer-normal samples from the same four tissue types. Based on exon-level expression profiling and computational analyses of alternative splicing, we propose transcriptome instability as a molecular pan-cancer characteristic. The affected cancers show strong and non-random associations between low expression levels of splicing factor genes, and high amounts of aberrant exon skipping and inclusion, and vice versa, on a genome-wide scale.

  7. Sequencing, Annotation and Analysis of the Syrian Hamster (Mesocricetus auratus) Transcriptome

    PubMed Central

    Tchitchek, Nicolas; Safronetz, David; Rasmussen, Angela L.; Martens, Craig; Virtaneva, Kimmo; Porcella, Stephen F.; Feldmann, Heinz

    2014-01-01

    Background The Syrian hamster (golden hamster, Mesocricetus auratus) is gaining importance as a new experimental animal model for multiple pathogens, including emerging zoonotic diseases such as Ebola. Nevertheless there are currently no publicly available transcriptome reference sequences or genome for this species. Results A cDNA library derived from mRNA and snRNA isolated and pooled from the brains, lungs, spleens, kidneys, livers, and hearts of three adult female Syrian hamsters was sequenced. Sequence reads were assembled into 62,482 contigs and 111,796 reads remained unassembled (singletons). This combined contig/singleton dataset, designated as the Syrian hamster transcriptome, represents a total of 60,117,204 nucleotides. Our Mesocricetus auratus Syrian hamster transcriptome mapped to 11,648 mouse transcripts representing 9,562 distinct genes, and mapped to a similar number of transcripts and genes in the rat. We identified 214 quasi-complete transcripts based on mouse annotations. Canonical pathways involved in a broad spectrum of fundamental biological processes were significantly represented in the library. The Syrian hamster transcriptome was aligned to the current release of the Chinese hamster ovary (CHO) cell transcriptome and genome to improve the genomic annotation of this species. Finally, our Syrian hamster transcriptome was aligned against 14 other rodents, primate and laurasiatheria species to gain insights about the genetic relatedness and placement of this species. Conclusions This Syrian hamster transcriptome dataset significantly improves our knowledge of the Syrian hamster's transcriptome, especially towards its future use in infectious disease research. Moreover, this library is an important resource for the wider scientific community to help improve genome annotation of the Syrian hamster and other closely related species. Furthermore, these data provide the basis for development of expression microarrays that can be used in functional

  8. Ensembl Genomes 2013: scaling up access to genome-wide data

    USDA-ARS?s Scientific Manuscript database

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provi...

  9. Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea

    PubMed Central

    2014-01-01

    Background Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. Results We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. Conclusions Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes. PMID:24916971

  10. Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50)

    PubMed Central

    Iamartino, Daniela; Pruitt, Kim D; Sonstegard, Tad; Smith, Timothy P L; Low, Wai Yee; Biagini, Tommaso; Bomba, Lorenzo; Capomaccio, Stefano; Castiglioni, Bianca; Coletta, Angelo; Corrado, Federica; Ferré, Fabrizio; Iannuzzi, Leopoldo; Lawley, Cynthia; Macciotta, Nicolò; McClure, Matthew; Mancini, Giordano; Matassino, Donato; Mazza, Raffaele; Milanesi, Marco; Moioli, Bianca; Morandi, Nicola; Ramunno, Luigi; Peretti, Vincenzo; Pilla, Fabio; Ramelli, Paola; Schroeder, Steven; Strozzi, Francesco; Thibaud-Nissen, Francoise; Zicarelli, Luigi; Ajmone-Marsan, Paolo; Valentini, Alessio; Chillemi, Giovanni; Zimin, Aleksey

    2017-01-01

    Abstract Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2n = 50) and the swamp (2n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1. PMID:29048578

  11. Genome assembly and transcriptome resource for river buffalo, Bubalus bubalis (2n = 50).

    PubMed

    Williams, John L; Iamartino, Daniela; Pruitt, Kim D; Sonstegard, Tad; Smith, Timothy P L; Low, Wai Yee; Biagini, Tommaso; Bomba, Lorenzo; Capomaccio, Stefano; Castiglioni, Bianca; Coletta, Angelo; Corrado, Federica; Ferré, Fabrizio; Iannuzzi, Leopoldo; Lawley, Cynthia; Macciotta, Nicolò; McClure, Matthew; Mancini, Giordano; Matassino, Donato; Mazza, Raffaele; Milanesi, Marco; Moioli, Bianca; Morandi, Nicola; Ramunno, Luigi; Peretti, Vincenzo; Pilla, Fabio; Ramelli, Paola; Schroeder, Steven; Strozzi, Francesco; Thibaud-Nissen, Francoise; Zicarelli, Luigi; Ajmone-Marsan, Paolo; Valentini, Alessio; Chillemi, Giovanni; Zimin, Aleksey

    2017-10-01

    Water buffalo is a globally important species for agriculture and local economies. A de novo assembled, well-annotated reference sequence for the water buffalo is an important prerequisite for studying the biology of this species, and is necessary to manage genetic diversity and to use modern breeding and genomic selection techniques. However, no such genome assembly has been previously reported. There are 2 species of domestic water buffalo, the river (2 n = 50) and the swamp (2 n = 48) buffalo. Here we describe a draft quality reference sequence for the river buffalo created from Illumina GA and Roche 454 short read sequences using the MaSuRCA assembler. The assembled sequence is 2.83 Gb, consisting of 366 983 scaffolds with a scaffold N50 of 1.41 Mb and contig N50 of 21 398 bp. Annotation of the genome was supported by transcriptome data from 30 tissues and identified 21 711 predicted protein coding genes. Searches for complete mammalian BUSCO gene groups found 98.6% of curated single copy orthologs present among predicted genes, which suggests a high level of completeness of the genome. The annotated sequence is available from NCBI at accession GCA_000471725.1. © The Author 2017. Published by Oxford University Press.

  12. Transcriptome characterization and SSR discovery in large-scale loach Paramisgurnus dabryanus (Cobitidae, Cypriniformes).

    PubMed

    Li, Caijuan; Ling, Qufei; Ge, Chen; Ye, Zhuqing; Han, Xiaofei

    2015-02-25

    The large-scale loach (Paramisgurnus dabryanus, Cypriniformes) is a bottom-dwelling freshwater species of fish found mainly in eastern Asia. The natural germplasm resources of this important aquaculture species has been recently threatened due to overfishing and artificial propagation. The objective of this study is to obtain the first functional genomic resource and candidate molecular markers for future conservation and breeding research. Illumina paired-end sequencing generated over one hundred million reads that resulted in 71,887 assembled transcripts, with an average length of 1465bp. 42,093 (58.56%) protein-coding sequences were predicted; and 43,837 transcripts had significant matches to NCBI nonredundant protein (Nr) database. 29,389 and 14,419 transcripts were assigned into gene ontology (GO) categories and Eukaryotic Orthologous Groups (KOG), respectively. 22,102 (31.14%) transcripts were mapped to 302 KEGG pathways. In addition, 15,106 candidate SSR markers were identified, with 11,037 pairs of PCR primers designed. 400 primers pairs of SSR selected randomly were validated, of which 364 (91%) pairs of primers were able to produce PCR products. Further test with 41 loci and 20 large-scale loach specimens collected from the four largest lakes in China showed that 36 (87.8%) loci were polymorphic. The transcriptomic profile and SSR repertoire obtained in this study will facilitate population genetic studies and selective breeding of large-scale loach in the future. Copyright © 2015. Published by Elsevier B.V.

  13. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

    PubMed Central

    2013-01-01

    Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with

  14. Discovering Functions of Unannotated Genes from a Transcriptome Survey of Wild Fungal Isolates

    PubMed Central

    Ellison, Christopher E.; Kowbel, David; Glass, N. Louise; Taylor, John W.

    2014-01-01

    ABSTRACT Most fungal genomes are poorly annotated, and many fungal traits of industrial and biomedical relevance are not well suited to classical genetic screens. Assigning genes to phenotypes on a genomic scale thus remains an urgent need in the field. We developed an approach to infer gene function from expression profiles of wild fungal isolates, and we applied our strategy to the filamentous fungus Neurospora crassa. Using transcriptome measurements in 70 strains from two well-defined clades of this microbe, we first identified 2,247 cases in which the expression of an unannotated gene rose and fell across N. crassa strains in parallel with the expression of well-characterized genes. We then used image analysis of hyphal morphologies, quantitative growth assays, and expression profiling to test the functions of four genes predicted from our population analyses. The results revealed two factors that influenced regulation of metabolism of nonpreferred carbon and nitrogen sources, a gene that governed hyphal architecture, and a gene that mediated amino acid starvation resistance. These findings validate the power of our population-transcriptomic approach for inference of novel gene function, and we suggest that this strategy will be of broad utility for genome-scale annotation in many fungal systems. PMID:24692637

  15. Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production

    PubMed Central

    Yin, Xian; Shin, Hyun-dong; Li, Jianghua; Du, Guocheng; Liu, Long; Chen, Jian

    2017-01-01

    Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains—namely, A. niger H915-1 (citrate titer: 157 g L−1), A1 (117 g L−1), and L2 (76 g L−1)—to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the γ-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6 h) and the citrate synthesis stage (12 h, 24 h, 36 h, and 48 h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed. PMID:28106122

  16. Transcriptome sequencing and annotation of the halophytic microalga Dunaliella salina * #

    PubMed Central

    Hong, Ling; Liu, Jun-li; Midoun, Samira Z.; Miller, Philip C.

    2017-01-01

    The unicellular green alga Dunaliella salina is well adapted to salt stress and contains compounds (including β-carotene and vitamins) with potential commercial value. A large transcriptome database of D. salina during the adjustment, exponential and stationary growth phases was generated using a high throughput sequencing platform. We characterized the metabolic processes in D. salina with a focus on valuable metabolites, with the aim of manipulating D. salina to achieve greater economic value in large-scale production through a bioengineering strategy. Gene expression profiles under salt stress verified using quantitative polymerase chain reaction (qPCR) implied that salt can regulate the expression of key genes. This study generated a substantial fraction of D. salina transcriptional sequences for the entire growth cycle, providing a basis for the discovery of novel genes. This first full-scale transcriptome study of D. salina establishes a foundation for further comparative genomic studies. PMID:28990374

  17. Search for sarcoidosis candidate genes by integration of data from genomic, transcriptomic and proteomic studies.

    PubMed

    Maver, Ales; Medica, Igor; Peterlin, Borut

    2009-12-01

    The search for gene candidates in multifactorial diseases such as sarcoidosis can be based on the integration of linkage association data, gene expression data, and protein profile data from genomic, transcriptomic and proteomic studies, respectively. In this study we performed a literature-based search for studies reporting such data, followed by integration of collected information. Different databases were examined--Medline, HugGE Navigator, ArrayExpress and Gene Expression Omnibus (GEO). Candidate genes were defined as genes which were reported in at least 2 different types of omics studies. Genes previously investigated in sarcoidosis were excluded from further analyses. We identified 177 genes associated with sarcoidosis as potential new candidate genes. Subsequently, 9 gene candidates identified to overlap in 2 different types of studies (genomic, transcriptomic and/or proteomic) were consistently reported in at least 3 studies: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214. These genes are involved in regulation of immune response, cellular proliferation, apoptosis, inhibition of protease activity, lipid metabolism. Exact biological functions of HBEGF, LRIG1, PTPN23, DPM2 and NUP214 remain to be completely elucidated. We propose 9 candidate genes: SERPINB1, FABP4, S100A8, HBEGF, IL7R, LRIG1, PTPN23, DPM2 and NUP214, as genes with high potential for association with sarcoidosis.

  18. Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny

    PubMed Central

    2013-01-01

    Background The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information. Results The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage. Conclusions A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species. PMID:23597112

  19. Optimal Scaling of Digital Transcriptomes

    PubMed Central

    Glusman, Gustavo; Caballero, Juan; Robinson, Max; Kutlu, Burak; Hood, Leroy

    2013-01-01

    Deep sequencing of transcriptomes has become an indispensable tool for biology, enabling expression levels for thousands of genes to be compared across multiple samples. Since transcript counts scale with sequencing depth, counts from different samples must be normalized to a common scale prior to comparison. We analyzed fifteen existing and novel algorithms for normalizing transcript counts, and evaluated the effectiveness of the resulting normalizations. For this purpose we defined two novel and mutually independent metrics: (1) the number of “uniform” genes (genes whose normalized expression levels have a sufficiently low coefficient of variation), and (2) low Spearman correlation between normalized expression profiles of gene pairs. We also define four novel algorithms, one of which explicitly maximizes the number of uniform genes, and compared the performance of all fifteen algorithms. The two most commonly used methods (scaling to a fixed total value, or equalizing the expression of certain ‘housekeeping’ genes) yielded particularly poor results, surpassed even by normalization based on randomly selected gene sets. Conversely, seven of the algorithms approached what appears to be optimal normalization. Three of these algorithms rely on the identification of “ubiquitous” genes: genes expressed in all the samples studied, but never at very high or very low levels. We demonstrate that these include a “core” of genes expressed in many tissues in a mutually consistent pattern, which is suitable for use as an internal normalization guide. The new methods yield robustly normalized expression values, which is a prerequisite for the identification of differentially expressed and tissue-specific genes as potential biomarkers. PMID:24223126

  20. Genome-wide analysis of miRNA and mRNA transcriptomes during amelogenesis.

    PubMed

    Yin, Kaifeng; Hacia, Joseph G; Zhong, Zhe; Paine, Michael L

    2014-11-19

    In the rodent incisor during amelogenesis, as ameloblast cells transition from secretory stage to maturation stage, their morphology and transcriptome profiles change dramatically. Prior whole genome transcriptome analysis has given a broad picture of the molecular activities dominating both stages of amelogenesis, but this type of analysis has not included miRNA transcript profiling. In this study, we set out to document which miRNAs and corresponding target genes change significantly as ameloblasts transition from secretory- to maturation-stage amelogenesis. Total RNA samples from both secretory- and maturation-stage rat enamel organs were subjected to genome-wide miRNA and mRNA transcript profiling. We identified 59 miRNAs that were differentially expressed at the maturation stage relative to the secretory stage of enamel development (False Discovery Rate (FDR)<0.05, fold change (FC)≥1.8). In parallel, transcriptome profiling experiments identified 1,729 mRNA transcripts that were differentially expressed in the maturation stage compared to the secretory stage (FDR<0.05, FC≥1.8). Based on bioinformatics analyses, 5.8% (629 total) of these differentially expressed genes (DEGS) were highlighted as being the potential targets of 59 miRNAs that were differentially expressed in the opposite direction, in the same tissue samples. Although the number of predicted target DEGs was not higher than baseline expectations generated by examination of stably expressed miRNAs, Gene Ontology (GO) analysis showed that these 629 DEGS were enriched for ion transport, pH regulation, calcium handling, endocytotic, and apoptotic activities. Seven differentially expressed miRNAs (miR-21, miR-31, miR-488, miR-153, miR-135b, miR-135a and miR298) in secretory- and/or maturation-stage enamel organs were confirmed by in situ hybridization. Further, we used luciferase reporter assays to provide evidence that two of these differentially expressed miRNAs, miR-153 and miR-31, are potential

  1. Comparative transcriptome analysis reveals insights into the streamlined genomes of haplosclerid demosponges

    PubMed Central

    Guzman, Christine; Conaco, Cecilia

    2016-01-01

    Sponges (Porifera) are one of the most ancestral metazoan groups. They are characterized by a simple body plan lacking the true tissues and organ systems found in other animals. Members of this phylum display a remarkable diversity of form and function and yet little is known about the composition and complexity of their genomes. In this study, we sequenced the transcriptomes of two marine haplosclerid sponges belonging to Demospongiae, the largest and most diverse class within phylum Porifera, and compared their gene content with members of other sponge classes. We recovered 44,693 and 50,067 transcripts expressed in adult tissues of Haliclona amboinensis and Haliclona tubifera, respectively. These transcripts translate into 20,280 peptides in H. amboinensis and 18,000 peptides in H. tubifera. Genes associated with important signaling and metabolic pathways, regulatory networks, as well as genes that may be important in the organismal stress response, were identified in the transcriptomes. Futhermore, lineage-specific innovations were identified that may be correlated with observed sponge characters and ecological adaptations. The core gene complement expressed within the tissues of adult haplosclerid demosponges may represent a streamlined and flexible genetic toolkit that underlies the ecological success and resilience of sponges to environmental stress. PMID:26738846

  2. Comparative transcriptome analysis reveals insights into the streamlined genomes of haplosclerid demosponges

    NASA Astrophysics Data System (ADS)

    Guzman, Christine; Conaco, Cecilia

    2016-01-01

    Sponges (Porifera) are one of the most ancestral metazoan groups. They are characterized by a simple body plan lacking the true tissues and organ systems found in other animals. Members of this phylum display a remarkable diversity of form and function and yet little is known about the composition and complexity of their genomes. In this study, we sequenced the transcriptomes of two marine haplosclerid sponges belonging to Demospongiae, the largest and most diverse class within phylum Porifera, and compared their gene content with members of other sponge classes. We recovered 44,693 and 50,067 transcripts expressed in adult tissues of Haliclona amboinensis and Haliclona tubifera, respectively. These transcripts translate into 20,280 peptides in H. amboinensis and 18,000 peptides in H. tubifera. Genes associated with important signaling and metabolic pathways, regulatory networks, as well as genes that may be important in the organismal stress response, were identified in the transcriptomes. Futhermore, lineage-specific innovations were identified that may be correlated with observed sponge characters and ecological adaptations. The core gene complement expressed within the tissues of adult haplosclerid demosponges may represent a streamlined and flexible genetic toolkit that underlies the ecological success and resilience of sponges to environmental stress.

  3. Revealing Less Derived Nature of Cartilaginous Fish Genomes with Their Evolutionary Time Scale Inferred with Nuclear Genes

    PubMed Central

    Renz, Adina J.; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon. PMID:23825540

  4. Revealing less derived nature of cartilaginous fish genomes with their evolutionary time scale inferred with nuclear genes.

    PubMed

    Renz, Adina J; Meyer, Axel; Kuraku, Shigehiro

    2013-01-01

    Cartilaginous fishes, divided into Holocephali (chimaeras) and Elasmoblanchii (sharks, rays and skates), occupy a key phylogenetic position among extant vertebrates in reconstructing their evolutionary processes. Their accurate evolutionary time scale is indispensable for better understanding of the relationship between phenotypic and molecular evolution of cartilaginous fishes. However, our current knowledge on the time scale of cartilaginous fish evolution largely relies on estimates using mitochondrial DNA sequences. In this study, making the best use of the still partial, but large-scale sequencing data of cartilaginous fish species, we estimate the divergence times between the major cartilaginous fish lineages employing nuclear genes. By rigorous orthology assessment based on available genomic and transcriptomic sequence resources for cartilaginous fishes, we selected 20 protein-coding genes in the nuclear genome, spanning 2973 amino acid residues. Our analysis based on the Bayesian inference resulted in the mean divergence time of 421 Ma, the late Silurian, for the Holocephali-Elasmobranchii split, and 306 Ma, the late Carboniferous, for the split between sharks and rays/skates. By applying these results and other documented divergence times, we measured the relative evolutionary rate of the Hox A cluster sequences in the cartilaginous fish lineages, which resulted in a lower substitution rate with a factor of at least 2.4 in comparison to tetrapod lineages. The obtained time scale enables mapping phenotypic and molecular changes in a quantitative framework. It is of great interest to corroborate the less derived nature of cartilaginous fish at the molecular level as a genome-wide phenomenon.

  5. An integrated genomic and transcriptomic survey of mucormycosis-causing fungi

    PubMed Central

    Chibucos, Marcus C.; Soliman, Sameh; Gebremariam, Teclegiorgis; Lee, Hongkyu; Daugherty, Sean; Orvis, Joshua; Shetty, Amol C.; Crabtree, Jonathan; Hazen, Tracy H.; Etienne, Kizee A.; Kumari, Priti; O'Connor, Timothy D.; Rasko, David A.; Filler, Scott G.; Fraser, Claire M.; Lockhart, Shawn R.; Skory, Christopher D.; Ibrahim, Ashraf S.; Bruno, Vincent M.

    2016-01-01

    Mucormycosis is a life-threatening infection caused by Mucorales fungi. Here we sequence 30 fungal genomes, and perform transcriptomics with three representative Rhizopus and Mucor strains and with human airway epithelial cells during fungal invasion, to reveal key host and fungal determinants contributing to pathogenesis. Analysis of the host transcriptional response to Mucorales reveals platelet-derived growth factor receptor B (PDGFRB) signaling as part of a core response to divergent pathogenic fungi; inhibition of PDGFRB reduces Mucorales-induced damage to host cells. The unique presence of CotH invasins in all invasive Mucorales, and the correlation between CotH gene copy number and clinical prevalence, are consistent with an important role for these proteins in mucormycosis pathogenesis. Our work provides insight into the evolution of this medically and economically important group of fungi, and identifies several molecular pathways that might be exploited as potential therapeutic targets. PMID:27447865

  6. Diversity and Genome Analysis of Australian and Global Oilseed Brassica napus L. Germplasm Using Transcriptomics and Whole Genome Re-sequencing.

    PubMed

    Malmberg, M Michelle; Shi, Fan; Spangenberg, German C; Daetwyler, Hans D; Cogan, Noel O I

    2018-01-01

    Intensive breeding of Brassica napus has resulted in relatively low diversity, such that B. napus would benefit from germplasm improvement schemes that sustain diversity. As such, samples representative of global germplasm pools need to be assessed for existing population structure, diversity and linkage disequilibrium (LD). Complexity reduction genotyping-by-sequencing (GBS) methods, including GBS-transcriptomics (GBS-t), enable cost-effective screening of a large number of samples, while whole genome re-sequencing (WGR) delivers the ability to generate large numbers of unbiased genomic single nucleotide polymorphisms (SNPs), and identify structural variants (SVs). Furthermore, the development of genomic tools based on whole genomes representative of global oilseed diversity and orientated by the reference genome has substantial industry relevance and will be highly beneficial for canola breeding. As recent studies have focused on European and Chinese varieties, a global diversity panel as well as a substantial number of Australian spring types were included in this study. Focusing on industry relevance, 633 varieties were initially genotyped using GBS-t to examine population structure using 61,037 SNPs. Subsequently, 149 samples representative of global diversity were selected for WGR and both data sets used for a side-by-side evaluation of diversity and LD. The WGR data was further used to develop genomic resources consisting of a list of 4,029,750 high-confidence SNPs annotated using SnpEff, and SVs in the form of 10,976 deletions and 2,556 insertions. These resources form the basis of a reliable and repeatable system allowing greater integration between canola genomics studies, with a strong focus on breeding germplasm and industry applicability.

  7. Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data

    PubMed Central

    2017-01-01

    Mapping gene expression as a quantitative trait using whole genome-sequencing and transcriptome analysis allows to discover the functional consequences of genetic variation. We developed a novel method and ultra-fast software Findr for higly accurate causal inference between gene expression traits using cis-regulatory DNA variations as causal anchors, which improves current methods by taking into consideration hidden confounders and weak regulations. Findr outperformed existing methods on the DREAM5 Systems Genetics challenge and on the prediction of microRNA and transcription factor targets in human lymphoblastoid cells, while being nearly a million times faster. Findr is publicly available at https://github.com/lingfeiwang/findr. PMID:28821014

  8. BLIND ordering of large-scale transcriptomic developmental timecourses.

    PubMed

    Anavy, Leon; Levin, Michal; Khair, Sally; Nakanishi, Nagayasu; Fernandez-Valverde, Selene L; Degnan, Bernard M; Yanai, Itai

    2014-03-01

    RNA-Seq enables the efficient transcriptome sequencing of many samples from small amounts of material, but the analysis of these data remains challenging. In particular, in developmental studies, RNA-Seq is challenged by the morphological staging of samples, such as embryos, since these often lack clear markers at any particular stage. In such cases, the automatic identification of the stage of a sample would enable previously infeasible experimental designs. Here we present the 'basic linear index determination of transcriptomes' (BLIND) method for ordering samples comprising different developmental stages. The method is an implementation of a traveling salesman algorithm to order the transcriptomes according to their inter-relationships as defined by principal components analysis. To establish the direction of the ordered samples, we show that an appropriate indicator is the entropy of transcriptomic gene expression levels, which increases over developmental time. Using BLIND, we correctly recover the annotated order of previously published embryonic transcriptomic timecourses for frog, mosquito, fly and zebrafish. We further demonstrate the efficacy of BLIND by collecting 59 embryos of the sponge Amphimedon queenslandica and ordering their transcriptomes according to developmental stage. BLIND is thus useful in establishing the temporal order of samples within large datasets and is of particular relevance to the study of organisms with asynchronous development and when morphological staging is difficult.

  9. Transcriptome assembly, gene annotation and tissue gene expression atlas of the rainbow trout

    USDA-ARS?s Scientific Manuscript database

    Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complimented by transcriptome information that will enhance genome assembly and annotation. Previously, we reported a transcriptome reference sequence using a 19X coverage of Sanger and 454-pyrosequencing dat...

  10. EUCANEXT: an integrated database for the exploration of genomic and transcriptomic data from Eucalyptus species

    PubMed Central

    Nascimento, Leandro Costa; Salazar, Marcela Mendes; Lepikson-Neto, Jorge; Camargo, Eduardo Leal Oliveira; Parreiras, Lucas Salera; Carazzolle, Marcelo Falsarella

    2017-01-01

    Abstract Tree species of the genus Eucalyptus are the most valuable and widely planted hardwoods in the world. Given the economic importance of Eucalyptus trees, much effort has been made towards the generation of specimens with superior forestry properties that can deliver high-quality feedstocks, customized to the industrýs needs for both cellulosic (paper) and lignocellulosic biomass production. In line with these efforts, large sets of molecular data have been generated by several scientific groups, providing invaluable information that can be applied in the development of improved specimens. In order to fully explore the potential of available datasets, the development of a public database that provides integrated access to genomic and transcriptomic data from Eucalyptus is needed. EUCANEXT is a database that analyses and integrates publicly available Eucalyptus molecular data, such as the E. grandis genome assembly and predicted genes, ESTs from several species and digital gene expression from 26 RNA-Seq libraries. The database has been implemented in a Fedora Linux machine running MySQL and Apache, while Perl CGI was used for the web interfaces. EUCANEXT provides a user-friendly web interface for easy access and analysis of publicly available molecular data from Eucalyptus species. This integrated database allows for complex searches by gene name, keyword or sequence similarity and is publicly accessible at http://www.lge.ibi.unicamp.br/eucalyptusdb. Through EUCANEXT, users can perform complex analysis to identify genes related traits of interest using RNA-Seq libraries and tools for differential expression analysis. Moreover, all the bioinformatics pipeline here described, including the database schema and PERL scripts, are readily available and can be applied to any genomic and transcriptomic project, regardless of the organism. Database URL: http://www.lge.ibi.unicamp.br/eucalyptusdb PMID:29220468

  11. Metastatic breast carcinomas display genomic and transcriptomic heterogeneity

    PubMed Central

    Weigelt, Britta; Ng, Charlotte KY; Shen, Ronglai; Popova, Tatiana; Schizas, Michail; Natrajan, Rachael; Mariani, Odette; Stern, Marc-Henri; Norton, Larry; Vincent-Salomon, Anne; Reis-Filho, Jorge S

    2015-01-01

    features of metaplastic breast carcinomas is reflected at the transcriptomic level, and an association between molecular subtypes and histology was observed. BRCA1-like genomic profiles were found only in a subset (31%) of metaplastic breast cancers, and were not associated with a specific molecular or histologic subtype. PMID:25412848

  12. Ensembl Genomes 2013: scaling up access to genome-wide data.

    PubMed

    Kersey, Paul Julian; Allen, James E; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Hughes, Daniel Seth Toney; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Langridge, Nicholas; McDowall, Mark D; Maheswari, Uma; Maslen, Gareth; Nuhn, Michael; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Tuli, Mary Ann; Walts, Brandon; Williams, Gareth; Wilson, Derek; Youens-Clark, Ken; Monaco, Marcela K; Stein, Joshua; Wei, Xuehong; Ware, Doreen; Bolser, Daniel M; Howe, Kevin Lee; Kulesha, Eugene; Lawson, Daniel; Staines, Daniel Michael

    2014-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.

  13. Population genetics, phylogenomics and hybrid speciation of Juglans in China determined from whole chloroplast genomes, transcriptomes, and genotyping-by-sequencing (GBS)

    Treesearch

    Peng Zhao; Hui-Juan Zhou; Daniel Potter; Yi-Heng Hu; Xiao-Jia Feng; Meng Dang; Li Feng; Saman Zulfiqar; Wen-Zhe Liu; Gui-Fang Zhao; Keith Woeste

    2018-01-01

    Genomic data are a powerful tool for elucidating the processes involved in the evolution and divergence of species. The speciation and phylogenetic relationships among Chinese Juglans remain unclear. Here, we used results from phylogenomic and population genetic analyses, transcriptomics, Genotyping-By-Sequencing (GBS), and whole chloroplast...

  14. Comparative transcriptome assembly and genome-guided profiling for Brettanomyces bruxellensis LAMAP2480 during p-coumaric acid stress.

    PubMed

    Godoy, Liliana; Vera-Wolf, Patricia; Martinez, Claudio; Ugalde, Juan A; Ganga, María Angélica

    2016-09-28

    Brettanomyces bruxellensis has been described as the main contaminant yeast in wine production, due to its ability to convert the hydroxycinnamic acids naturally present in the grape phenolic derivatives, into volatile phenols. Currently, there are no studies in B. bruxellensis which explains the resistance mechanisms to hydroxycinnamic acids, and in particular to p-coumaric acid which is directly involved in alterations to wine. In this work, we performed a transcriptome analysis of B. bruxellensis LAMAP248rown in the presence and absence of p-coumaric acid during lag phase. Because of reported genetic variability among B. bruxellensis strains, to complement de novo assembly of the transcripts, we used the high-quality genome of B. bruxellensis AWRI1499, as well as the draft genomes of strains CBS2499 and0 g LAMAP2480. The results from the transcriptome analysis allowed us to propose a model in which the entrance of p-coumaric acid to the cell generates a generalized stress condition, in which the expression of proton pump and efflux of toxic compounds are induced. In addition, these mechanisms could be involved in the outflux of nitrogen compounds, such as amino acids, decreasing the overall concentration and triggering the expression of nitrogen metabolism genes.

  15. Comparative transcriptome assembly and genome-guided profiling for Brettanomyces bruxellensis LAMAP2480 during p-coumaric acid stress

    PubMed Central

    Godoy, Liliana; Vera-Wolf, Patricia; Martinez, Claudio; Ugalde, Juan A.; Ganga, María Angélica

    2016-01-01

    Brettanomyces bruxellensis has been described as the main contaminant yeast in wine production, due to its ability to convert the hydroxycinnamic acids naturally present in the grape phenolic derivatives, into volatile phenols. Currently, there are no studies in B. bruxellensis which explains the resistance mechanisms to hydroxycinnamic acids, and in particular to p-coumaric acid which is directly involved in alterations to wine. In this work, we performed a transcriptome analysis of B. bruxellensis LAMAP248rown in the presence and absence of p-coumaric acid during lag phase. Because of reported genetic variability among B. bruxellensis strains, to complement de novo assembly of the transcripts, we used the high-quality genome of B. bruxellensis AWRI1499, as well as the draft genomes of strains CBS2499 and0 g LAMAP2480. The results from the transcriptome analysis allowed us to propose a model in which the entrance of p-coumaric acid to the cell generates a generalized stress condition, in which the expression of proton pump and efflux of toxic compounds are induced. In addition, these mechanisms could be involved in the outflux of nitrogen compounds, such as amino acids, decreasing the overall concentration and triggering the expression of nitrogen metabolism genes. PMID:27678167

  16. New in-depth rainbow trout transcriptome reference and digital atlas of gene expression

    USDA-ARS?s Scientific Manuscript database

    Sequencing the rainbow trout genome is underway and a transcriptome reference sequence is required to help in genome assembly and gene discovery. Previously, we reported a transcriptome reference sequence using a 19X coverage of 454-pyrosequencing data. Although this work added a great wealth of ann...

  17. Genome-wide retinal transcriptome analysis of endotoxin-induced uveitis in mice with next-generation sequencing

    PubMed Central

    Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao

    2017-01-01

    Purpose Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. Methods The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. Results At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes were validated to show similar results as the RNA-seq. Conclusions We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response–, complement system–, fibrinolytic system–, and cell stress–related genes occurred concurrently in EIU. These genes may play an important role

  18. Genome-wide retinal transcriptome analysis of endotoxin-induced uveitis in mice with next-generation sequencing.

    PubMed

    Qiu, Yiguo; Yu, Peng; Lin, Ru; Fu, Xinyu; Hao, Bingtao; Lei, Bo

    2017-01-01

    Endotoxin-induced uveitis (EIU) is a well-established mouse model for studying human acute inflammatory uveitis. The purpose of this study is to investigate the genome-wide retinal transcriptome profile of EIU. The anterior segment of the mice was examined with a slit-lamp, and clinical scores were evaluated simultaneously. The histological changes in the posterior segment of the eyes were evaluated with hematoxylin and eosin (H&E) staining. A high throughput RNA sequencing (RNA-seq) strategy using the Illumina Hiseq 2500 platform was applied to characterize the retinal transcriptome profile from lipopolysaccharide (LPS)-treated and untreated mice. The validation of the differentially expressed genes (DEGs) was analyzed with real-time PCR. At the 24th hour after challenge, the clinical score of the LPS group was significantly higher (3.83±0.75, mean ± standard deviation [SD]) than that of the control group (0.08±0.20, mean ± SD; p<0.001). The histological evaluation showed a large number of inflammatory cells infiltrated into the vitreous cavity in the LPS group compared with the control group. A total of 478 DEGs were identified with RNA-seq. Among these genes, 406 were upregulated and 72 were downregulated in the LPS group. Gene Ontology (GO) enrichment showed three significantly enriched upregulated terms. Twenty-one upregulated and seven downregulated pathways were remarkably enriched by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment. Eleven inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes were validated to show similar results as the RNA-seq. We first reported the retinal transcriptome profile of the EIU mouse with RNA-seq. The results indicate that the abnormal changes in the inflammatory response-, complement system-, fibrinolytic system-, and cell stress-related genes occurred concurrently in EIU. These genes may play an important role in the pathogenesis of EIU. This study will lead to a

  19. The testes transcriptome derived from the New World Screwworm, Cochliomyia hominivorax TSA

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources researchers, we sequenced and assembled the testes transcriptome derived from the Pacora, Panama, production plant strain of the New World Screwworm, Cochliomyia hominivorax. This transcriptome contains 4,149 unigenes and the Transcriptome...

  20. A refined genome-scale reconstruction of Chlamydomonas metabolism provides a platform for systems-level analyses

    DOE PAGES

    Imam, Saheed; Schäuble, Sascha; Valenzuela, Jacob; ...

    2015-10-20

    Microalgae have reemerged as organisms of prime biotechnological interest due to their ability to synthesize a suite of valuable chemicals. To harness the capabilities of these organisms, we need a comprehensive systems-level understanding of their metabolism, which can be fundamentally achieved through large-scale mechanistic models of metabolism. In this study, we present a revised and significantly improved genome-scale metabolic model for the widely-studied microalga, Chlamydomonas reinhardtii. The model, iCre1355, represents a major advance over previous models, both in content and predictive power. iCre1355 encompasses a broad range of metabolic functions encoded across the nuclear, chloroplast and mitochondrial genomes accounting formore » 1355 genes (1460 transcripts), 2394 and 1133 metabolites. We found improved performance over the previous metabolic model based on comparisons of predictive accuracy across 306 phenotypes (from 81 mutants), lipid yield analysis and growth rates derived from chemostat-grown cells (under three conditions). Measurement of macronutrient uptake revealed carbon and phosphate to be good predictors of growth rate, while nitrogen consumption appeared to be in excess. We analyzed high-resolution time series transcriptomics data using iCre1355 to uncover dynamic pathway-level changes that occur in response to nitrogen starvation and changes in light intensity. This approach enabled accurate prediction of growth rates, the cessation of growth and accumulation of triacylglycerols during nitrogen starvation, and the temporal response of different growth-associated pathways to increased light intensity. Thus, iCre1355 represents an experimentally validated genome-scale reconstruction of C. reinhardtii metabolism that should serve as a useful resource for studying the metabolic processes of this and related microalgae.« less

  1. Genome-wide transcriptome profiling reveals novel insights into Luffa cylindrica browning.

    PubMed

    Chen, Xia; Tan, Taiming; Xu, Changcheng; Huang, Shuping; Tan, Jie; Zhang, Min; Wang, Chunli; Xie, Conghua

    2015-08-07

    Luffa cylindrica (sponge gourd) is one of the most popular vegetables in China. Production and consumption of L. cylindrica are limited due to postharvest browning; however, little is known about the genetic regulation of the browning process. In the present study, transcriptome profiles of L. cylindrica cultivars, YLB05 (browning resistant) and XTR05 (browning sensitive), were analyzed using next-generation sequencing to clarify the genes and mechanisms associated with browning. A total of 9.1 Gb of valid data including 116,703 unigenes (>200 bp) were obtained and 39,473 sequences were annotated by alignment against five public databases. Of these, there were 27,407 genes assigned to 747 Gene Ontology functional categories; and 12,350 genes were annotated with 25 Eukaryotic Orthologous Groups (KOG) categories with 343 KOG functional terms. Additionally, by searching against the Kyoto Encyclopedia of Genes and Genomes database, 8689 unigenes were mapped to 189 pathways. Furthermore, there were 24,556 sequences found to be differentially regulated, including 4344 annotated unigenes. Several genes potentially associated with phenolic oxidation, carbohydrate and hormone metabolism were found differentially regulated between the cultivars of different browning sensitivities. Our results suggest that elements involved in enzymatic processes and other pathways might be responsible for L. cylindrica browning. The present study provides a comprehensive transcriptome sequence resource, which will facilitate further studies on gene discovery and exploiting the fruit browning mechanism of L. cylindrica. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Pore-scale simulation of microbial growth using a genome-scale metabolic model: Implications for Darcy-scale reactive transport

    NASA Astrophysics Data System (ADS)

    Tartakovsky, G. D.; Tartakovsky, A. M.; Scheibe, T. D.; Fang, Y.; Mahadevan, R.; Lovley, D. R.

    2013-09-01

    Recent advances in microbiology have enabled the quantitative simulation of microbial metabolism and growth based on genome-scale characterization of metabolic pathways and fluxes. We have incorporated a genome-scale metabolic model of the iron-reducing bacteria Geobacter sulfurreducens into a pore-scale simulation of microbial growth based on coupling of iron reduction to oxidation of a soluble electron donor (acetate). In our model, fluid flow and solute transport is governed by a combination of the Navier-Stokes and advection-diffusion-reaction equations. Microbial growth occurs only on the surface of soil grains where solid-phase mineral iron oxides are available. Mass fluxes of chemical species associated with microbial growth are described by the genome-scale microbial model, implemented using a constraint-based metabolic model, and provide the Robin-type boundary condition for the advection-diffusion equation at soil grain surfaces. Conventional models of microbially-mediated subsurface reactions use a lumped reaction model that does not consider individual microbial reaction pathways, and describe reactions rates using empirically-derived rate formulations such as the Monod-type kinetics. We have used our pore-scale model to explore the relationship between genome-scale metabolic models and Monod-type formulations, and to assess the manifestation of pore-scale variability (microenvironments) in terms of apparent Darcy-scale microbial reaction rates. The genome-scale model predicted lower biomass yield, and different stoichiometry for iron consumption, in comparison to prior Monod formulations based on energetics considerations. We were able to fit an equivalent Monod model, by modifying the reaction stoichiometry and biomass yield coefficient, that could effectively match results of the genome-scale simulation of microbial behaviors under excess nutrient conditions, but predictions of the fitted Monod model deviated from those of the genome-scale model

  3. Pore-scale simulation of microbial growth using a genome-scale metabolic model: Implications for Darcy-scale reactive transport

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tartakovsky, Guzel D.; Tartakovsky, Alexandre M.; Scheibe, Timothy D.

    2013-09-07

    Recent advances in microbiology have enabled the quantitative simulation of microbial metabolism and growth based on genome-scale characterization of metabolic pathways and fluxes. We have incorporated a genome-scale metabolic model of the iron-reducing bacteria Geobacter sulfurreducens into a pore-scale simulation of microbial growth based on coupling of iron reduction to oxidation of a soluble electron donor (acetate). In our model, fluid flow and solute transport is governed by a combination of the Navier-Stokes and advection-diffusion-reaction equations. Microbial growth occurs only on the surface of soil grains where solid-phase mineral iron oxides are available. Mass fluxes of chemical species associated withmore » microbial growth are described by the genome-scale microbial model, implemented using a constraint-based metabolic model, and provide the Robin-type boundary condition for the advection-diffusion equation at soil grain surfaces. Conventional models of microbially-mediated subsurface reactions use a lumped reaction model that does not consider individual microbial reaction pathways, and describe reactions rates using empirically-derived rate formulations such as the Monod-type kinetics. We have used our pore-scale model to explore the relationship between genome-scale metabolic models and Monod-type formulations, and to assess the manifestation of pore-scale variability (microenvironments) in terms of apparent Darcy-scale microbial reaction rates. The genome-scale model predicted lower biomass yield, and different stoichiometry for iron consumption, in comparisonto prior Monod formulations based on energetics considerations. We were able to fit an equivalent Monod model, by modifying the reaction stoichiometry and biomass yield coefficient, that could effectively match results of the genome-scale simulation of microbial behaviors under excess nutrient conditions, but predictions of the fitted Monod model deviated from those of the genome-scale

  4. Pore-scale simulation of microbial growth using a genome-scale metabolic model: Implications for Darcy-scale reactive transport

    NASA Astrophysics Data System (ADS)

    Scheibe, T. D.; Tartakovsky, G.; Tartakovsky, A. M.; Fang, Y.; Mahadevan, R.; Lovley, D. R.

    2012-12-01

    Recent advances in microbiology have enabled the quantitative simulation of microbial metabolism and growth based on genome-scale characterization of metabolic pathways and fluxes. We have incorporated a genome-scale metabolic model of the iron-reducing bacteria Geobacter sulfurreducens into a pore-scale simulation of microbial growth based on coupling of iron reduction to oxidation of a soluble electron donor (acetate). In our model, fluid flow and solute transport is governed by a combination of the Navier-Stokes and advection-diffusion-reaction equations. Microbial growth occurs only on the surface of soil grains where solid-phase mineral iron oxides are available. Mass fluxes of chemical species associated with microbial growth are described by the genome-scale microbial model, implemented using a constraint-based metabolic model, and provide the Robin-type boundary condition for the advection-diffusion equation at soil grain surfaces. Conventional models of microbially-mediated subsurface reactions use a lumped reaction model that does not consider individual microbial reaction pathways, and describe reactions rates using empirically-derived rate formulations such as the Monod-type kinetics. We have used our pore-scale model to explore the relationship between genome-scale metabolic models and Monod-type formulations, and to assess the manifestation of pore-scale variability (microenvironments) in terms of apparent Darcy-scale microbial reaction rates. The genome-scale model predicted lower biomass yield, and different stoichiometry for iron consumption, in comparison to prior Monod formulations based on energetics considerations. We were able to fit an equivalent Monod model, by modifying the reaction stoichiometry and biomass yield coefficient, that could effectively match results of the genome-scale simulation of microbial behaviors under excess nutrient conditions, but predictions of the fitted Monod model deviated from those of the genome-scale model

  5. Genome-wide comparative transcriptome analysis of CMS-D2 and its maintainer and restorer lines in upland cotton.

    PubMed

    Wu, Jianyong; Zhang, Meng; Zhang, Bingbing; Zhang, Xuexian; Guo, Liping; Qi, Tingxiang; Wang, Hailin; Zhang, Jinfa; Xing, Chaozhu

    2017-06-08

    Cytoplasmic male sterility (CMS) conferred by the cytoplasm from Gossypium harknessii (D2) is an important system for hybrid seed production in Upland cotton (G. hirsutum). The male sterility of CMS-D2 (i.e., A line) can be restored to fertility by a restorer (i.e., R line) carrying the restorer gene Rf1 transferred from the D2 nuclear genome. However, the molecular mechanisms of CMS-D2 and its restoration are poorly understood. In this study, a genome-wide comparative transcriptome analysis was performed to identify differentially expressed genes (DEGs) in flower buds among the isogenic fertile R line and sterile A line derived from a backcross population (BC 8 F 1 ) and the recurrent parent, i.e., the maintainer (B line). A total of 1464 DEGs were identified among the three isogenic lines, and the Rf1-carrying Chr_D05 and its homeologous Chr_A05 had more DEGs than other chromosomes. The results of GO and KEGG enrichment analysis showed differences in circadian rhythm between the fertile and sterile lines. Eleven DEGs were selected for validation using qRT-PCR, confirming the accuracy of the RNA-seq results. Through genome-wide comparative transcriptome analysis, the differential expression profiles of CMS-D2 and its maintainer and restorer lines in Upland cotton were identified. Our results provide an important foundation for further studies into the molecular mechanisms of the interactions between the restorer gene Rf1 and the CMS-D2 cytoplasm.

  6. Analysis of Aspergillus nidulans metabolism at the genome-scale

    PubMed Central

    David, Helga; Özçelik, İlknur Ş; Hofmann, Gerald; Nielsen, Jens

    2008-01-01

    Background Aspergillus nidulans is a member of a diverse group of filamentous fungi, sharing many of the properties of its close relatives with significance in the fields of medicine, agriculture and industry. Furthermore, A. nidulans has been a classical model organism for studies of development biology and gene regulation, and thus it has become one of the best-characterized filamentous fungi. It was the first Aspergillus species to have its genome sequenced, and automated gene prediction tools predicted 9,451 open reading frames (ORFs) in the genome, of which less than 10% were assigned a function. Results In this work, we have manually assigned functions to 472 orphan genes in the metabolism of A. nidulans, by using a pathway-driven approach and by employing comparative genomics tools based on sequence similarity. The central metabolism of A. nidulans, as well as biosynthetic pathways of relevant secondary metabolites, was reconstructed based on detailed metabolic reconstructions available for A. niger and Saccharomyces cerevisiae, and information on the genetics, biochemistry and physiology of A. nidulans. Thereby, it was possible to identify metabolic functions without a gene associated, and to look for candidate ORFs in the genome of A. nidulans by comparing its sequence to sequences of well-characterized genes in other species encoding the function of interest. A classification system, based on defined criteria, was developed for evaluating and selecting the ORFs among the candidates, in an objective and systematic manner. The functional assignments served as a basis to develop a mathematical model, linking 666 genes (both previously and newly annotated) to metabolic roles. The model was used to simulate metabolic behavior and additionally to integrate, analyze and interpret large-scale gene expression data concerning a study on glucose repression, thereby providing a means of upgrading the information content of experimental data and getting further

  7. Transcriptome analysis and related databases of Lactococcus lactis.

    PubMed

    Kuipers, Oscar P; de Jong, Anne; Baerends, Richard J S; van Hijum, Sacha A F T; Zomer, Aldert L; Karsens, Harma A; den Hengst, Chris D; Kramer, Naomi E; Buist, Girbe; Kok, Jan

    2002-08-01

    Several complete genome sequences of Lactococcus lactis and their annotations will become available in the near future, next to the already published genome sequence of L. lactis ssp. lactis IL 1403. This will allow intraspecies comparative genomics studies as well as functional genomics studies aimed at a better understanding of physiological processes and regulatory networks operating in lactococci. This paper describes the initial set-up of a DNA-microarray facility in our group, to enable transcriptome analysis of various Gram-positive bacteria, including a ssp. lactis and a ssp. cremoris strain of Lactococcus lactis. Moreover a global description will be given of the hardware and software requirements for such a set-up, highlighting the crucial integration of relevant bioinformatics tools and methods. This includes the development of MolGenIS, an information system for transcriptome data storage and retrieval, and LactococCye, a metabolic pathway/genome database of Lactococcus lactis.

  8. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data.

    PubMed

    Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma

    2017-02-22

    In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .

  9. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

    PubMed Central

    Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  10. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

    PubMed

    Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat.

  11. Analyses of transcriptome sequences reveal multiple ancient large-scale duplication events in the ancestor of Sphagnopsida (Bryophyta).

    PubMed

    Devos, Nicolas; Szövényi, Péter; Weston, David J; Rothfels, Carl J; Johnson, Matthew G; Shaw, A Jonathan

    2016-07-01

    The goal of this research was to investigate whether there has been a whole-genome duplication (WGD) in the ancestry of Sphagnum (peatmoss) or the class Sphagnopsida, and to determine if the timing of any such duplication(s) and patterns of paralog retention could help explain the rapid radiation and current ecological dominance of peatmosses. RNA sequencing (RNA-seq) data were generated for nine taxa in Sphagnopsida (Bryophyta). Analyses of frequency plots for synonymous substitutions per synonymous site (Ks ) between paralogous gene pairs and reconciliation of 578 gene trees were conducted to assess evidence of large-scale or genome-wide duplication events in each transcriptome. Both Ks frequency plots and gene tree-based analyses indicate multiple duplication events in the history of the Sphagnopsida. The most recent WGD event predates divergence of Sphagnum from the two other genera of Sphagnopsida. Duplicate retention is highly variable across species, which might be best explained by local adaptation. Our analyses indicate that the last WGD could have been an important factor underlying the diversification of peatmosses and facilitated their rise to ecological dominance in peatlands. The timing of the duplication events and their significance in the evolutionary history of peat mosses are discussed. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  12. Genomic structural differences between cattle and river buffalo identified through a combination and genomic and transcriptomic analysis

    USDA-ARS?s Scientific Manuscript database

    Water buffalo (Bubalus bubalis L.) is an important livestock species worldwide. Like many other livestock species, water buffalo lacks high quality and continuous reference genome assembly required for fine-scale comparative genomics studies. In this work, we present a dataset, which characterizes g...

  13. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    PubMed

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  14. Global transcriptomic profiling using small volumes of whole blood: a cost-effective method for translational genomic biomarker identification in small animals.

    PubMed

    Fricano, Meagan M; Ditewig, Amy C; Jung, Paul M; Liguori, Michael J; Blomme, Eric A G; Yang, Yi

    2011-01-01

    Blood is an ideal tissue for the identification of novel genomic biomarkers for toxicity or efficacy. However, using blood for transcriptomic profiling presents significant technical challenges due to the transcriptomic changes induced by ex vivo handling and the interference of highly abundant globin mRNA. Most whole blood RNA stabilization and isolation methods also require significant volumes of blood, limiting their effective use in small animal species, such as rodents. To overcome these challenges, a QIAzol-based RNA stabilization and isolation method (QSI) was developed to isolate sufficient amounts of high quality total RNA from 25 to 500 μL of rat whole blood. The method was compared to the standard PAXgene Blood RNA System using blood collected from rats exposed to saline or lipopolysaccharide (LPS). The QSI method yielded an average of 54 ng total RNA per μL of rat whole blood with an average RNA Integrity Number (RIN) of 9, a performance comparable with the standard PAXgene method. Total RNA samples were further processed using the NuGEN Ovation Whole Blood Solution system and cDNA was hybridized to Affymetrix Rat Genome 230 2.0 Arrays. The microarray QC parameters using RNA isolated with the QSI method were within the acceptable range for microarray analysis. The transcriptomic profiles were highly correlated with those using RNA isolated with the PAXgene method and were consistent with expected LPS-induced inflammatory responses. The present study demonstrated that the QSI method coupled with NuGEN Ovation Whole Blood Solution system is cost-effective and particularly suitable for transcriptomic profiling of minimal volumes of whole blood, typical of those obtained with small animal species.

  15. Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

    PubMed Central

    Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

    2015-01-01

    Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID

  16. Developmental Transcriptome of Aplysia californica

    PubMed Central

    HEYLAND, ANDREAS; VUE, ZER; VOOLSTRA, CHRISTIAN R.; MEDINA, MÓNICA; MOROZ, LEONID L.

    2014-01-01

    Genome-wide transcriptional changes in development provide important insight into mechanisms underlying growth, differentiation, and patterning. However, such large-scale developmental studies have been limited to a few representatives of Ecdysozoans and Chordates. Here, we characterize transcriptomes of embryonic, larval, and metamorphic development in the marine mollusc Aplysia californica and reveal novel molecular components associated with life history transitions. Specifically, we identify more than 20 signal peptides, putative hormones, and transcription factors in association with early development and metamorphic stages—many of which seem to be evolutionarily conserved elements of signal transduction pathways. We also characterize genes related to biomineralization—a critical process of molluscan development. In summary, our experiment provides the first large-scale survey of gene expression in mollusc development, and complements previous studies on the regulatory mechanisms underlying body plan patterning and the formation of larval and juvenile structures. This study serves as a resource for further functional annotation of transcripts and genes in Aplysia, specifically and molluscs in general. A comparison of the Aplysia developmental transcriptome with similar studies in the zebra fish Danio rerio, the fruit fly Drosophila melanogaster, the nematode Caenorhabditis elegans, and other studies on molluscs suggests an overall highly divergent pattern of gene regulatory mechanisms that are likely a consequence of the different developmental modes of these organisms. PMID:21328528

  17. Insight into Catechins Metabolic Pathways of Camellia sinensis Based on Genome and Transcriptome Analysis.

    PubMed

    Wang, Wenzhao; Zhou, Yihui; Wu, Yingling; Dai, Xinlong; Liu, Yajun; Qian, Yumei; Li, Mingzhuo; Jiang, Xiaolan; Wang, Yunsheng; Gao, Liping; Xia, Tao

    2018-04-25

    Tea is an important economic crop with a 3.02 Gb genome. It accumulates various bioactive compounds, especially catechins, which are closely associated with tea flavor and quality. Catechins are biosynthesized through the phenylpropanoid and flavonoid pathways, with 12 structural genes being involved in their synthesis. However, we found that in Camellia sinensis the understanding of the basic profile of catechins biosynthesis is still unclear. The gene structure, locus, transcript number, transcriptional variation, and function of multigene families have not yet been clarified. Our previous studies demonstrated that the accumulation of flavonoids in tea is species, tissue, and induction specific, which indicates that gene coexpression patterns may be involved in tea catechins and flavonoids biosynthesis. In this paper, we screened candidate genes of multigene families involved in the phenylpropanoid and flavonoid pathways based on an analysis of genome and transcriptome sequence data. The authenticity of candidate genes was verified by PCR cloning, and their function was validated by reverse genetic methods. In the present study, 36 genes from 12 gene families were identified and were accessed in the NCBI database. During this process, some intron retention events of the CsCHI and CsDFR genes were found. Furthermore, the transcriptome sequencing of various tea tissues and subcellular location assays revealed coexpression and colocalization patterns. The correlation analysis showed that CsCHIc, CsF3'H, and CsANRb expression levels are associated significantly with the concentration of soluble PA as well as the expression levels of CsPALc and CsPALf with the concentration of insoluble PA. This work provides insights into catechins metabolism in tea and provides a foundation for future studies.

  18. Next-Generation Genomics Facility at C-CAMP: Accelerating Genomic Research in India

    PubMed Central

    S, Chandana; Russiachand, Heikham; H, Pradeep; S, Shilpa; M, Ashwini; S, Sahana; B, Jayanth; Atla, Goutham; Jain, Smita; Arunkumar, Nandini; Gowda, Malali

    2014-01-01

    Next-Generation Sequencing (NGS; http://www.genome.gov/12513162) is a recent life-sciences technological revolution that allows scientists to decode genomes or transcriptomes at a much faster rate with a lower cost. Genomic-based studies are in a relatively slow pace in India due to the non-availability of genomics experts, trained personnel and dedicated service providers. Using NGS there is a lot of potential to study India's national diversity (of all kinds). We at the Centre for Cellular and Molecular Platforms (C-CAMP) have launched the Next Generation Genomics Facility (NGGF) to provide genomics service to scientists, to train researchers and also work on national and international genomic projects. We have HiSeq1000 from Illumina and GS-FLX Plus from Roche454. The long reads from GS FLX Plus, and high sequence depth from HiSeq1000, are the best and ideal hybrid approaches for de novo and re-sequencing of genomes and transcriptomes. At our facility, we have sequenced around 70 different organisms comprising of more than 388 genomes and 615 transcriptomes – prokaryotes and eukaryotes (fungi, plants and animals). In addition we have optimized other unique applications such as small RNA (miRNA, siRNA etc), long Mate-pair sequencing (2 to 20 Kb), Coding sequences (Exome), Methylome (ChIP-Seq), Restriction Mapping (RAD-Seq), Human Leukocyte Antigen (HLA) typing, mixed genomes (metagenomes) and target amplicons, etc. Translating DNA sequence data from NGS sequencer into meaningful information is an important exercise. Under NGGF, we have bioinformatics experts and high-end computing resources to dissect NGS data such as genome assembly and annotation, gene expression, target enrichment, variant calling (SSR or SNP), comparative analysis etc. Our services (sequencing and bioinformatics) have been utilized by more than 45 organizations (academia and industry) both within India and outside, resulting several publications in peer-reviewed journals and several genomic/transcriptomic

  19. DOGMA: domain-based transcriptome and proteome quality assessment.

    PubMed

    Dohmen, Elias; Kremer, Lukas P M; Bornberg-Bauer, Erich; Kemena, Carsten

    2016-09-01

    Genome studies have become cheaper and easier than ever before, due to the decreased costs of high-throughput sequencing and the free availability of analysis software. However, the quality of genome or transcriptome assemblies can vary a lot. Therefore, quality assessment of assemblies and annotations are crucial aspects of genome analysis pipelines. We developed DOGMA, a program for fast and easy quality assessment of transcriptome and proteome data based on conserved protein domains. DOGMA measures the completeness of a given transcriptome or proteome and provides information about domain content for further analysis. DOGMA provides a very fast way to do quality assessment within seconds. DOGMA is implemented in Python and published under GNU GPL v.3 license. The source code is available on https://ebbgit.uni-muenster.de/domainWorld/DOGMA/ CONTACTS: e.dohmen@wwu.de or c.kemena@wwu.de Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Genome-scale biological models for industrial microbial systems.

    PubMed

    Xu, Nan; Ye, Chao; Liu, Liming

    2018-04-01

    The primary aims and challenges associated with microbial fermentation include achieving faster cell growth, higher productivity, and more robust production processes. Genome-scale biological models, predicting the formation of an interaction among genetic materials, enzymes, and metabolites, constitute a systematic and comprehensive platform to analyze and optimize the microbial growth and production of biological products. Genome-scale biological models can help optimize microbial growth-associated traits by simulating biomass formation, predicting growth rates, and identifying the requirements for cell growth. With regard to microbial product biosynthesis, genome-scale biological models can be used to design product biosynthetic pathways, accelerate production efficiency, and reduce metabolic side effects, leading to improved production performance. The present review discusses the development of microbial genome-scale biological models since their emergence and emphasizes their pertinent application in improving industrial microbial fermentation of biological products.

  1. Genomic and transcriptomic evidence for scavenging of diverse organic compounds by widespread deep-sea archaea.

    PubMed

    Li, Meng; Baker, Brett J; Anantharaman, Karthik; Jain, Sunit; Breier, John A; Dick, Gregory J

    2015-11-17

    Microbial activity is one of the most important processes to mediate the flux of organic carbon from the ocean surface to the seafloor. However, little is known about the microorganisms that underpin this key step of the global carbon cycle in the deep oceans. Here we present genomic and transcriptomic evidence that five ubiquitous archaeal groups actively use proteins, carbohydrates, fatty acids and lipids as sources of carbon and energy at depths ranging from 800 to 4,950 m in hydrothermal vent plumes and pelagic background seawater across three different ocean basins. Genome-enabled metabolic reconstructions and gene expression patterns show that these marine archaea are motile heterotrophs with extensive mechanisms for scavenging organic matter. Our results shed light on the ecological and physiological properties of ubiquitous marine archaea and highlight their versatile metabolic strategies in deep oceans that might play a critical role in global carbon cycling.

  2. The OME Framework for genome-scale systems biology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palsson, Bernhard O.; Ebrahim, Ali; Federowicz, Steve

    The life sciences are undergoing continuous and accelerating integration with computational and engineering sciences. The biology that many in the field have been trained on may be hardly recognizable in ten to twenty years. One of the major drivers for this transformation is the blistering pace of advancements in DNA sequencing and synthesis. These advances have resulted in unprecedented amounts of new data, information, and knowledge. Many software tools have been developed to deal with aspects of this transformation and each is sorely needed [1-3]. However, few of these tools have been forced to deal with the full complexity ofmore » genome-scale models along with high throughput genome- scale data. This particular situation represents a unique challenge, as it is simultaneously necessary to deal with the vast breadth of genome-scale models and the dizzying depth of high-throughput datasets. It has been observed time and again that as the pace of data generation continues to accelerate, the pace of analysis significantly lags behind [4]. It is also evident that, given the plethora of databases and software efforts [5-12], it is still a significant challenge to work with genome-scale metabolic models, let alone next-generation whole cell models [13-15]. We work at the forefront of model creation and systems scale data generation [16-18]. The OME Framework was borne out of a practical need to enable genome-scale modeling and data analysis under a unified framework to drive the next generation of genome-scale biological models. Here we present the OME Framework. It exists as a set of Python classes. However, we want to emphasize the importance of the underlying design as an addition to the discussions on specifications of a digital cell. A great deal of work and valuable progress has been made by a number of communities [13, 19-24] towards interchange formats and implementations designed to achieve similar goals. While many software tools exist for handling

  3. Transcriptome analysis in Concholepas concholepas (Gastropoda, Muricidae): mining and characterization of new genomic and molecular markers.

    PubMed

    Cárdenas, Leyla; Sánchez, Roland; Gomez, Daniela; Fuenzalida, Gonzalo; Gallardo-Escárate, Cristián; Tanguy, Arnaud

    2011-09-01

    The marine gastropod Concholepas concholepas, locally known as the "loco", is the main target species of the benthonic Chilean fisheries. Genetic and genomic tools are necessary to study the genome of this species in order to understand the molecular basis of its development, growth, and other key traits to improve the management strategies and to identify local adaptation to prevent loss of biodiversity. Here, we use pyrosequencing technologies to generate the first transcriptomic database from adult specimens of the loco. After trimming, a total of 140,756 Expressed Sequence Tag sequences were achieved. Clustering and assembly analysis identified 19,219 contigs and 105,435 singleton sequences. BlastN analysis showed a significant identity with Expressed Sequence Tags of different gastropod species available in public databases. Similarly, BlastX results showed that only 895 out of the total 124,654 had significant hits and may represent novel genes for marine gastropods. From this database, simple sequence repeat motifs were also identified and a total of 38 primer pairs were designed and tested to assess their potential as informative markers and to investigate their cross-species amplification in different related gastropod species. This dataset represents the first publicly available 454 data for a marine gastropod endemic to the southeastern Pacific coast, providing a valuable transcriptomic resource for future efforts of gene discovery and development of functional markers in other marine gastropods. Copyright © 2011 Elsevier B.V. All rights reserved.

  4. Genome and Transcriptome Analysis of the Fungal Pathogen Fusarium oxysporum f. sp. cubense Causing Banana Vascular Wilt Disease

    PubMed Central

    Zeng, Huicai; Fan, Dingding; Zhu, Yabin; Feng, Yue; Wang, Guofen; Peng, Chunfang; Jiang, Xuanting; Zhou, Dajie; Ni, Peixiang; Liang, Changcong; Liu, Lei; Wang, Jun; Mao, Chao

    2014-01-01

    Background The asexual fungus Fusarium oxysporum f. sp. cubense (Foc) causing vascular wilt disease is one of the most devastating pathogens of banana (Musa spp.). To understand the molecular underpinning of pathogenicity in Foc, the genomes and transcriptomes of two Foc isolates were sequenced. Methodology/Principal Findings Genome analysis revealed that the genome structures of race 1 and race 4 isolates were highly syntenic with those of F. oxysporum f. sp. lycopersici strain Fol4287. A large number of putative virulence associated genes were identified in both Foc genomes, including genes putatively involved in root attachment, cell degradation, detoxification of toxin, transport, secondary metabolites biosynthesis and signal transductions. Importantly, relative to the Foc race 1 isolate (Foc1), the Foc race 4 isolate (Foc4) has evolved with some expanded gene families of transporters and transcription factors for transport of toxins and nutrients that may facilitate its ability to adapt to host environments and contribute to pathogenicity to banana. Transcriptome analysis disclosed a significant difference in transcriptional responses between Foc1 and Foc4 at 48 h post inoculation to the banana ‘Brazil’ in comparison with the vegetative growth stage. Of particular note, more virulence-associated genes were up regulated in Foc4 than in Foc1. Several signaling pathways like the mitogen-activated protein kinase Fmk1 mediated invasion growth pathway, the FGA1-mediated G protein signaling pathway and a pathogenicity associated two-component system were activated in Foc4 rather than in Foc1. Together, these differences in gene content and transcription response between Foc1 and Foc4 might account for variation in their virulence during infection of the banana variety ‘Brazil’. Conclusions/Significance Foc genome sequences will facilitate us to identify pathogenicity mechanism involved in the banana vascular wilt disease development. These will thus advance

  5. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, CY; Yang, H; Wei, CL

    Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Using high-throughput Illumina RNA-seq, the transcriptome from poly (A){sup +} RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled intomore » 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and

  6. Deep sequencing of the Camellia sinensis transcriptome revealed candidate genes for major metabolic pathways of tea-specific compounds

    PubMed Central

    2011-01-01

    Background Tea is one of the most popular non-alcoholic beverages worldwide. However, the tea plant, Camellia sinensis, is difficult to culture in vitro, to transform, and has a large genome, rendering little genomic information available. Recent advances in large-scale RNA sequencing (RNA-seq) provide a fast, cost-effective, and reliable approach to generate large expression datasets for functional genomic analysis, which is especially suitable for non-model species with un-sequenced genomes. Results Using high-throughput Illumina RNA-seq, the transcriptome from poly (A)+ RNA of C. sinensis was analyzed at an unprecedented depth (2.59 gigabase pairs). Approximate 34.5 million reads were obtained, trimmed, and assembled into 127,094 unigenes, with an average length of 355 bp and an N50 of 506 bp, which consisted of 788 contig clusters and 126,306 singletons. This number of unigenes was 10-fold higher than existing C. sinensis sequences deposited in GenBank (as of August 2010). Sequence similarity analyses against six public databases (Uniprot, NR and COGs at NCBI, Pfam, InterPro and KEGG) found 55,088 unigenes that could be annotated with gene descriptions, conserved protein domains, or gene ontology terms. Some of the unigenes were assigned to putative metabolic pathways. Targeted searches using these annotations identified the majority of genes associated with several primary metabolic pathways and natural product pathways that are important to tea quality, such as flavonoid, theanine and caffeine biosynthesis pathways. Novel candidate genes of these secondary pathways were discovered. Comparisons with four previously prepared cDNA libraries revealed that this transcriptome dataset has both a high degree of consistency with previous EST data and an approximate 20 times increase in coverage. Thirteen unigenes related to theanine and flavonoid synthesis were validated. Their expression patterns in different organs of the tea plant were analyzed by RT-PCR and

  7. A refined genome-scale reconstruction of Chlamydomonas metabolism provides a platform for systems-level analyses.

    PubMed

    Imam, Saheed; Schäuble, Sascha; Valenzuela, Jacob; López García de Lomana, Adrián; Carter, Warren; Price, Nathan D; Baliga, Nitin S

    2015-12-01

    Microalgae have reemerged as organisms of prime biotechnological interest due to their ability to synthesize a suite of valuable chemicals. To harness the capabilities of these organisms, we need a comprehensive systems-level understanding of their metabolism, which can be fundamentally achieved through large-scale mechanistic models of metabolism. In this study, we present a revised and significantly improved genome-scale metabolic model for the widely-studied microalga, Chlamydomonas reinhardtii. The model, iCre1355, represents a major advance over previous models, both in content and predictive power. iCre1355 encompasses a broad range of metabolic functions encoded across the nuclear, chloroplast and mitochondrial genomes accounting for 1355 genes (1460 transcripts), 2394 and 1133 metabolites. We found improved performance over the previous metabolic model based on comparisons of predictive accuracy across 306 phenotypes (from 81 mutants), lipid yield analysis and growth rates derived from chemostat-grown cells (under three conditions). Measurement of macronutrient uptake revealed carbon and phosphate to be good predictors of growth rate, while nitrogen consumption appeared to be in excess. We analyzed high-resolution time series transcriptomics data using iCre1355 to uncover dynamic pathway-level changes that occur in response to nitrogen starvation and changes in light intensity. This approach enabled accurate prediction of growth rates, the cessation of growth and accumulation of triacylglycerols during nitrogen starvation, and the temporal response of different growth-associated pathways to increased light intensity. Thus, iCre1355 represents an experimentally validated genome-scale reconstruction of C. reinhardtii metabolism that should serve as a useful resource for studying the metabolic processes of this and related microalgae. © 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.

  8. The Mitochondrial Genome and Transcriptome of the Basal Dinoflagellate Hematodinium sp.: Character Evolution within the Highly Derived Mitochondrial Genomes of Dinoflagellates

    PubMed Central

    Gornik, S. G.; Waller, R. F.

    2012-01-01

    The sister phyla dinoflagellates and apicomplexans inherited a drastically reduced mitochondrial genome (mitochondrial DNA, mtDNA) containing only three protein-coding (cob, cox1, and cox3) genes and two ribosomal RNA (rRNA) genes. In apicomplexans, single copies of these genes are encoded on the smallest known mtDNA chromosome (6 kb). In dinoflagellates, however, the genome has undergone further substantial modifications, including massive genome amplification and recombination resulting in multiple copies of each gene and gene fragments linked in numerous combinations. Furthermore, protein-encoding genes have lost standard stop codons, trans-splicing of messenger RNAs (mRNAs) is required to generate complete cox3 transcripts, and extensive RNA editing recodes most genes. From taxa investigated to date, it is unclear when many of these unusual dinoflagellate mtDNA characters evolved. To address this question, we investigated the mitochondrial genome and transcriptome character states of the deep branching dinoflagellate Hematodinium sp. Genomic data show that like later-branching dinoflagellates Hematodinium sp. also contains an inflated, heavily recombined genome of multicopy genes and gene fragments. Although stop codons are also lacking for cox1 and cob, cox3 still encodes a conventional stop codon. Extensive editing of mRNAs also occurs in Hematodinium sp. The mtDNA of basal dinoflagellate Hematodinium sp. indicates that much of the mtDNA modification in dinoflagellates occurred early in this lineage, including genome amplification and recombination, and decreased use of standard stop codons. Trans-splicing, on the other hand, occurred after Hematodinium sp. diverged. Only RNA editing presents a nonlinear pattern of evolution in dinoflagellates as this process occurs in Hematodinium sp. but is absent in some later-branching taxa indicating that this process was either lost in some lineages or developed more than once during the evolution of the highly unusual

  9. The mitochondrial genome and transcriptome of the basal dinoflagellate Hematodinium sp.: character evolution within the highly derived mitochondrial genomes of dinoflagellates.

    PubMed

    Jackson, C J; Gornik, S G; Waller, R F

    2012-01-01

    The sister phyla dinoflagellates and apicomplexans inherited a drastically reduced mitochondrial genome (mitochondrial DNA, mtDNA) containing only three protein-coding (cob, cox1, and cox3) genes and two ribosomal RNA (rRNA) genes. In apicomplexans, single copies of these genes are encoded on the smallest known mtDNA chromosome (6 kb). In dinoflagellates, however, the genome has undergone further substantial modifications, including massive genome amplification and recombination resulting in multiple copies of each gene and gene fragments linked in numerous combinations. Furthermore, protein-encoding genes have lost standard stop codons, trans-splicing of messenger RNAs (mRNAs) is required to generate complete cox3 transcripts, and extensive RNA editing recodes most genes. From taxa investigated to date, it is unclear when many of these unusual dinoflagellate mtDNA characters evolved. To address this question, we investigated the mitochondrial genome and transcriptome character states of the deep branching dinoflagellate Hematodinium sp. Genomic data show that like later-branching dinoflagellates Hematodinium sp. also contains an inflated, heavily recombined genome of multicopy genes and gene fragments. Although stop codons are also lacking for cox1 and cob, cox3 still encodes a conventional stop codon. Extensive editing of mRNAs also occurs in Hematodinium sp. The mtDNA of basal dinoflagellate Hematodinium sp. indicates that much of the mtDNA modification in dinoflagellates occurred early in this lineage, including genome amplification and recombination, and decreased use of standard stop codons. Trans-splicing, on the other hand, occurred after Hematodinium sp. diverged. Only RNA editing presents a nonlinear pattern of evolution in dinoflagellates as this process occurs in Hematodinium sp. but is absent in some later-branching taxa indicating that this process was either lost in some lineages or developed more than once during the evolution of the highly unusual

  10. Epigenetic transgenerational inheritance of somatic transcriptomes and epigenetic control regions

    PubMed Central

    2012-01-01

    Background Environmentally induced epigenetic transgenerational inheritance of adult onset disease involves a variety of phenotypic changes, suggesting a general alteration in genome activity. Results Investigation of different tissue transcriptomes in male and female F3 generation vinclozolin versus control lineage rats demonstrated all tissues examined had transgenerational transcriptomes. The microarrays from 11 different tissues were compared with a gene bionetwork analysis. Although each tissue transgenerational transcriptome was unique, common cellular pathways and processes were identified between the tissues. A cluster analysis identified gene modules with coordinated gene expression and each had unique gene networks regulating tissue-specific gene expression and function. A large number of statistically significant over-represented clusters of genes were identified in the genome for both males and females. These gene clusters ranged from 2-5 megabases in size, and a number of them corresponded to the epimutations previously identified in sperm that transmit the epigenetic transgenerational inheritance of disease phenotypes. Conclusions Combined observations demonstrate that all tissues derived from the epigenetically altered germ line develop transgenerational transcriptomes unique to the tissue, but common epigenetic control regions in the genome may coordinately regulate these tissue-specific transcriptomes. This systems biology approach provides insight into the molecular mechanisms involved in the epigenetic transgenerational inheritance of a variety of adult onset disease phenotypes. PMID:23034163

  11. The utility of transcriptomics in fish conservation.

    PubMed

    Connon, Richard E; Jeffries, Ken M; Komoroske, Lisa M; Todgham, Anne E; Fangue, Nann A

    2018-01-29

    There is growing recognition of the need to understand the mechanisms underlying organismal resilience (i.e. tolerance, acclimatization) to environmental change to support the conservation management of sensitive and economically important species. Here, we discuss how functional genomics can be used in conservation biology to provide a cellular-level understanding of organismal responses to environmental conditions. In particular, the integration of transcriptomics with physiological and ecological research is increasingly playing an important role in identifying functional physiological thresholds predictive of compensatory responses and detrimental outcomes, transforming the way we can study issues in conservation biology. Notably, with technological advances in RNA sequencing, transcriptome-wide approaches can now be applied to species where no prior genomic sequence information is available to develop species-specific tools and investigate sublethal impacts that can contribute to population declines over generations and undermine prospects for long-term conservation success. Here, we examine the use of transcriptomics as a means of determining organismal responses to environmental stressors and use key study examples of conservation concern in fishes to highlight the added value of transcriptome-wide data to the identification of functional response pathways. Finally, we discuss the gaps between the core science and policy frameworks and how thresholds identified through transcriptomic evaluations provide evidence that can be more readily used by resource managers. © 2018. Published by The Company of Biologists Ltd.

  12. TRAM (Transcriptome Mapper): database-driven creation and analysis of transcriptome maps from multiple sources

    PubMed Central

    2011-01-01

    Background Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input. Results TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with

  13. Extreme-Scale De Novo Genome Assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Georganas, Evangelos; Hofmeyr, Steven; Egan, Rob

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and themore » large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.« less

  14. Spider Transcriptomes Identify Ancient Large-Scale Gene Duplication Event Potentially Important in Silk Gland Evolution

    PubMed Central

    Clarke, Thomas H.; Garb, Jessica E.; Hayashi, Cheryl Y.; Arensburger, Peter; Ayoub, Nadia A.

    2015-01-01

    The evolution of specialized tissues with novel functions, such as the silk synthesizing glands in spiders, is likely an influential driver of adaptive success. Large-scale gene duplication events and subsequent paralog divergence are thought to be required for generating evolutionary novelty. Such an event has been proposed for spiders, but not tested. We de novo assembled transcriptomes from three cobweb weaving spider species. Based on phylogenetic analyses of gene families with representatives from each of the three species, we found numerous duplication events indicative of a whole genome or segmental duplication. We estimated the age of the gene duplications relative to several speciation events within spiders and arachnids and found that the duplications likely occurred after the divergence of scorpions (order Scorpionida) and spiders (order Araneae), but before the divergence of the spider suborders Mygalomorphae and Araneomorphae, near the evolutionary origin of spider silk glands. Transcripts that are expressed exclusively or primarily within black widow silk glands are more likely to have a paralog descended from the ancient duplication event and have elevated amino acid replacement rates compared with other transcripts. Thus, an ancient large-scale gene duplication event within the spider lineage was likely an important source of molecular novelty during the evolution of silk gland-specific expression. This duplication event may have provided genetic material for subsequent silk gland diversification in the true spiders (Araneomorphae). PMID:26058392

  15. Impact of a short-term exposure to spaceflight on the phenotype, genome, transcriptome and proteome of Escherichia coli

    NASA Astrophysics Data System (ADS)

    Li, Tianzhi; Chang, De; Xu, Huiwen; Chen, Jiapeng; Su, Longxiang; Guo, Yinghua; Chen, Zhenhong; Wang, Yajuan; Wang, Li; Wang, Junfeng; Fang, Xiangqun; Liu, Changting

    2015-07-01

    Escherichia coli (E. coli) is the most widely applied model organism in current biological science. As a widespread opportunistic pathogen, E. coli can survive not only by symbiosis with human, but also outside the host as well, which necessitates the evaluation of its response to the space environment. Therefore, to keep humans safe in space, it is necessary to understand how the bacteria respond to this environment. Despite extensive investigations for a few decades, the response of E. coli to the real space environment is still controversial. To better understand the mechanisms how E. coli overcomes harsh environments such as microgravity in space and to investigate whether these factors may induce pathogenic changes in E. coli that are potentially detrimental to astronauts, we conducted detailed genomics, transcriptomic and proteomic studies on E. coli that experienced 17 days of spaceflight. By comparing two flight strains LCT-EC52 and LCT-EC59 to a control strain LCT-EC106 that was cultured under the same temperature conditions on the ground, we identified metabolism changes, polymorphism changes, differentially expressed genes and proteins in the two flight strains. The flight strains differed from the control in the utilization of more than 30 carbon sources. Two single nucleotide polymorphisms (SNPs) and one deletion were identified in the flight strains. The expression level of more than 1000 genes altered in flight strains. Genes involved in chemotaxis, lipid metabolism and cell motility express differently. Moreover, the two flight strains also differed extensively from each other in terms of metabolism, transcriptome and proteome, indicating the impact of space environment on individual cells is heterogeneous and probably genotype-dependent. This study presents the first systematic profile of E. coli genome, transcriptome and proteome after spaceflight, which helps to elucidate the mechanism that controls the adaptation of microbes to the space

  16. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel andmore » fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.« less

  17. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  18. The first whole genome and transcriptome of the cinereous vulture reveals adaptation in the gastric and immune defense systems and possible convergent evolution between the Old and New World vultures.

    PubMed

    Chung, Oksung; Jin, Seondeok; Cho, Yun Sung; Lim, Jeongheui; Kim, Hyunho; Jho, Sungwoong; Kim, Hak-Min; Jun, JeHoon; Lee, HyeJin; Chon, Alvin; Ko, Junsu; Edwards, Jeremy; Weber, Jessica A; Han, Kyudong; O'Brien, Stephen J; Manica, Andrea; Bhak, Jong; Paek, Woon Kee

    2015-10-21

    The cinereous vulture, Aegypius monachus, is the largest bird of prey and plays a key role in the ecosystem by removing carcasses, thus preventing the spread of diseases. Its feeding habits force it to cope with constant exposure to pathogens, making this species an interesting target for discovering functionally selected genetic variants. Furthermore, the presence of two independently evolved vulture groups, Old World and New World vultures, provides a natural experiment in which to investigate convergent evolution due to obligate scavenging. We sequenced the genome of a cinereous vulture, and mapped it to the bald eagle reference genome, a close relative with a divergence time of 18 million years. By comparing the cinereous vulture to other avian genomes, we find positively selected genetic variations in this species associated with respiration, likely linked to their ability of immune defense responses and gastric acid secretion, consistent with their ability to digest carcasses. Comparisons between the Old World and New World vulture groups suggest convergent gene evolution. We assemble the cinereous vulture blood transcriptome from a second individual, and annotate genes. Finally, we infer the demographic history of the cinereous vulture which shows marked fluctuations in effective population size during the late Pleistocene. We present the first genome and transcriptome analyses of the cinereous vulture compared to other avian genomes and transcriptomes, revealing genetic signatures of dietary and environmental adaptations accompanied by possible convergent evolution between the Old World and New World vultures.

  19. Large-Scale Sequencing: The Future of Genomic Sciences Colloquium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Margaret Riley; Merry Buckley

    2009-01-01

    Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencingmore » is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place

  20. Genome and transcriptome sequencing characterises the gene space of Macadamia integrifolia (Proteaceae).

    PubMed

    Nock, Catherine J; Baten, Abdul; Barkla, Bronwyn J; Furtado, Agnelo; Henry, Robert J; King, Graham J

    2016-11-17

    The large Gondwanan plant family Proteaceae is an early-diverging eudicot lineage renowned for its morphological, taxonomic and ecological diversity. Macadamia is the most economically important Proteaceae crop and represents an ancient rainforest-restricted lineage. The family is a focus for studies of adaptive radiation due to remarkable species diversification in Mediterranean-climate biodiversity hotspots, and numerous evolutionary transitions between biomes. Despite a long history of research, comparative analyses in the Proteaceae and macadamia breeding programs are restricted by a paucity of genetic information. To address this, we sequenced the genome and transcriptome of the widely grown Macadamia integrifolia cultivar 741. Over 95 gigabases of DNA and RNA-seq sequence data were de novo assembled and annotated. The draft assembly has a total length of 518 Mb and spans approximately 79% of the estimated genome size. Following annotation, 35,337 protein-coding genes were predicted of which over 90% were expressed in at least one of the leaf, shoot or flower tissues examined. Gene family comparisons with five other eudicot species revealed 13,689 clusters containing macadamia genes and 1005 macadamia-specific clusters, and provides evidence for linage-specific expansion of gene families involved in pathogen recognition, plant defense and monoterpene synthesis. Cyanogenesis is an important defense strategy in the Proteaceae, and a detailed analysis of macadamia gene homologues potentially involved in cyanogenic glycoside biosynthesis revealed several highly expressed candidate genes. The gene space of macadamia provides a foundation for comparative genomics, gene discovery and the acceleration of molecular-assisted breeding. This study presents the first available genomic resources for the large basal eudicot family Proteaceae, access to most macadamia genes and opportunities to uncover the genetic basis of traits of importance for adaptation and crop

  1. Plasmodium vivax Biology: Insights Provided by Genomics, Transcriptomics and Proteomics

    PubMed Central

    Bourgard, Catarina; Albrecht, Letusa; Kayano, Ana C. A. V.; Sunnerhagen, Per; Costa, Fabio T. M.

    2018-01-01

    During the last decade, the vast omics field has revolutionized biological research, especially the genomics, transcriptomics and proteomics branches, as technological tools become available to the field researcher and allow difficult question-driven studies to be addressed. Parasitology has greatly benefited from next generation sequencing (NGS) projects, which have resulted in a broadened comprehension of basic parasite molecular biology, ecology and epidemiology. Malariology is one example where application of this technology has greatly contributed to a better understanding of Plasmodium spp. biology and host-parasite interactions. Among the several parasite species that cause human malaria, the neglected Plasmodium vivax presents great research challenges, as in vitro culturing is not yet feasible and functional assays are heavily limited. Therefore, there are gaps in our P. vivax biology knowledge that affect decisions for control policies aiming to eradicate vivax malaria in the near future. In this review, we provide a snapshot of key discoveries already achieved in P. vivax sequencing projects, focusing on developments, hurdles, and limitations currently faced by the research community, as well as perspectives on future vivax malaria research. PMID:29473024

  2. Genomic analysis and temperature-dependent transcriptome profiles of the rhizosphere originating strain Pseudomonas aeruginosa M18

    PubMed Central

    2011-01-01

    Background Our previously published reports have described an effective biocontrol agent named Pseudomonas sp. M18 as its 16S rDNA sequence and several regulator genes share homologous sequences with those of P. aeruginosa, but there are several unusual phenotypic features. This study aims to explore its strain specific genomic features and gene expression patterns at different temperatures. Results The complete M18 genome is composed of a single chromosome of 6,327,754 base pairs containing 5684 open reading frames. Seven genomic islands, including two novel prophages and five specific non-phage islands were identified besides the conserved P. aeruginosa core genome. Each prophage contains a putative chitinase coding gene, and the prophage II contains a capB gene encoding a putative cold stress protein. The non-phage genomic islands contain genes responsible for pyoluteorin biosynthesis, environmental substance degradation and type I and III restriction-modification systems. Compared with other P. aeruginosa strains, the fewest number (3) of insertion sequences and the most number (3) of clustered regularly interspaced short palindromic repeats in M18 genome may contribute to the relative genome stability. Although the M18 genome is most closely related to that of P. aeruginosa strain LESB58, the strain M18 is more susceptible to several antimicrobial agents and easier to be erased in a mouse acute lung infection model than the strain LESB58. The whole M18 transcriptomic analysis indicated that 10.6% of the expressed genes are temperature-dependent, with 22 genes up-regulated at 28°C in three non-phage genomic islands and one prophage but none at 37°C. Conclusions The P. aeruginosa strain M18 has evolved its specific genomic structures and temperature dependent expression patterns to meet the requirement of its fitness and competitiveness under selective pressures imposed on the strain in rhizosphere niche. PMID:21884571

  3. The aquatic animals' transcriptome resource for comparative functional analysis.

    PubMed

    Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da

    2018-05-09

    Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .

  4. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups.

    PubMed

    Curtis, Christina; Shah, Sohrab P; Chin, Suet-Feung; Turashvili, Gulisa; Rueda, Oscar M; Dunning, Mark J; Speed, Doug; Lynch, Andy G; Samarajiwa, Shamith; Yuan, Yinyin; Gräf, Stefan; Ha, Gavin; Haffari, Gholamreza; Bashashati, Ali; Russell, Roslin; McKinney, Steven; Langerød, Anita; Green, Andrew; Provenzano, Elena; Wishart, Gordon; Pinder, Sarah; Watson, Peter; Markowetz, Florian; Murphy, Leigh; Ellis, Ian; Purushotham, Arnie; Børresen-Dale, Anne-Lise; Brenton, James D; Tavaré, Simon; Caldas, Carlos; Aparicio, Samuel

    2012-04-18

    The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome.

  5. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

    PubMed Central

    Curtis, Christina; Shah, Sohrab P.; Chin, Suet-Feung; Turashvili, Gulisa; Rueda, Oscar M.; Dunning, Mark J.; Speed, Doug; Lynch, Andy G.; Samarajiwa, Shamith; Yuan, Yinyin; Gräf, Stefan; Ha, Gavin; Haffari, Gholamreza; Bashashati, Ali; Russell, Roslin; McKinney, Steven; Langerød, Anita; Green, Andrew; Provenzano, Elena; Wishart, Gordon; Pinder, Sarah; Watson, Peter; Markowetz, Florian; Murphy, Leigh; Ellis, Ian; Purushotham, Arnie; Børresen-Dale, Anne-Lise; Brenton, James D.; Tavaré, Simon; Caldas, Carlos; Aparicio, Samuel

    2012-01-01

    The elucidation of breast cancer subgroups and their molecular drivers requires integrated views of the genome and transcriptome from representative numbers of patients. We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up. Inherited variants (copy number variants and single nucleotide polymorphisms) and acquired somatic copy number aberrations (CNAs) were associated with expression in ~40% of genes, with the landscape dominated by cis- and trans-acting CNAs. By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4. Unsupervised analysis of paired DNA–RNA profiles revealed novel subgroups with distinct clinical outcomes, which reproduced in the validation cohort. These include a high-risk, oestrogen-receptor-positive 11q13/14 cis-acting subgroup and a favourable prognosis subgroup devoid of CNAs. Trans-acting aberration hotspots were found to modulate subgroup-specific gene networks, including a TCR deletion-mediated adaptive immune response in the ‘CNA-devoid’ subgroup and a basal-specific chromosome 5 deletion-associated mitotic network. Our results provide a novel molecular stratification of the breast cancer population, derived from the impact of somatic CNAs on the transcriptome. PMID:22522925

  6. Separating homeologs by phasing in the tetraploid wheat transcriptome.

    PubMed

    Krasileva, Ksenia V; Buffalo, Vince; Bailey, Paul; Pearce, Stephen; Ayling, Sarah; Tabbita, Facundo; Soria, Marcelo; Wang, Shichen; Akhunov, Eduard; Uauy, Cristobal; Dubcovsky, Jorge

    2013-06-25

    The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies.

  7. Genome-Wide Transcriptome Analysis Reveals Extensive Alternative Splicing Events in the Protoscoleces of Echinococcus granulosus and Echinococcus multilocularis

    PubMed Central

    Liu, Shuai; Zhou, Xiaosu; Hao, Lili; Piao, Xianyu; Hou, Nan; Chen, Qijun

    2017-01-01

    Alternative splicing (AS), as one of the most important topics in the post-genomic era, has been extensively studied in numerous organisms. However, little is known about the prevalence and characteristics of AS in Echinococcus species, which can cause significant health problems to humans and domestic animals. Based on high-throughput RNA-sequencing data, we performed a genome-wide survey of AS in two major pathogens of echinococcosis-Echinococcus granulosus and Echinococcus multilocularis. Our study revealed that the prevalence and characteristics of AS in protoscoleces of the two parasites were generally consistent with each other. A total of 6,826 AS events from 3,774 E. granulosus genes and 6,644 AS events from 3,611 E. multilocularis genes were identified in protoscolex transcriptomes, indicating that 33–36% of genes were subject to AS in the two parasites. Strikingly, intron retention instead of exon skipping was the predominant type of AS in Echinococcus species. Moreover, analysis of the Kyoto Encyclopedia of Genes and Genomes pathway indicated that genes that underwent AS events were significantly enriched in multiple pathways mainly related to metabolism (e.g., purine, fatty acid, galactose, and glycerolipid metabolism), signal transduction (e.g., Jak-STAT, VEGF, Notch, and GnRH signaling pathways), and genetic information processing (e.g., RNA transport and mRNA surveillance pathways). The landscape of AS obtained in this study will not only facilitate future investigations on transcriptome complexity and AS regulation during the life cycle of Echinococcus species, but also provide an invaluable resource for future functional and evolutionary studies of AS in platyhelminth parasites. PMID:28588571

  8. Comparative Life Cycle Transcriptomics Revises Leishmania mexicana Genome Annotation and Links a Chromosome Duplication with Parasitism of Vertebrates

    PubMed Central

    Fiebig, Michael; Kelly, Steven; Gluenz, Eva

    2015-01-01

    Leishmania spp. are protozoan parasites that have two principal life cycle stages: the motile promastigote forms that live in the alimentary tract of the sandfly and the amastigote forms, which are adapted to survive and replicate in the harsh conditions of the phagolysosome of mammalian macrophages. Here, we used Illumina sequencing of poly-A selected RNA to characterise and compare the transcriptomes of L. mexicana promastigotes, axenic amastigotes and intracellular amastigotes. These data allowed the production of the first transcriptome evidence-based annotation of gene models for this species, including genome-wide mapping of trans-splice sites and poly-A addition sites. The revised genome annotation encompassed 9,169 protein-coding genes including 936 novel genes as well as modifications to previously existing gene models. Comparative analysis of gene expression across promastigote and amastigote forms revealed that 3,832 genes are differentially expressed between promastigotes and intracellular amastigotes. A large proportion of genes that were downregulated during differentiation to amastigotes were associated with the function of the motile flagellum. In contrast, those genes that were upregulated included cell surface proteins, transporters, peptidases and many uncharacterized genes, including 293 of the 936 novel genes. Genome-wide distribution analysis of the differentially expressed genes revealed that the tetraploid chromosome 30 is highly enriched for genes that were upregulated in amastigotes, providing the first evidence of a link between this whole chromosome duplication event and adaptation to the vertebrate host in this group. Peptide evidence for 42 proteins encoded by novel transcripts supports the idea of an as yet uncharacterised set of small proteins in Leishmania spp. with possible implications for host-pathogen interactions. PMID:26452044

  9. The Carcinogenic Liver Fluke, Clonorchis sinensis: New Assembly, Reannotation and Analysis of the Genome and Characterization of Tissue Transcriptomes

    PubMed Central

    Wang, Xiaoyun; Liu, Hailiang; Chen, Yangyi; Guo, Lei; Luo, Fang; Sun, Jiufeng; Mao, Qiang; Liang, Pei; Xie, Zhizhi; Zhou, Chenhui; Tian, Yanli; Lv, Xiaoli; Huang, Lisi; Zhou, Juanjuan; Hu, Yue; Li, Ran; Zhang, Fan; Lei, Huali; Li, Wenfang; Hu, Xuchu; Liang, Chi; Xu, Jin; Li, Xuerong; Yu, Xinbing

    2013-01-01

    Clonorchis sinensis (C. sinensis), an important food-borne parasite that inhabits the intrahepatic bile duct and causes clonorchiasis, is of interest to both the public health field and the scientific research community. To learn more about the migration, parasitism and pathogenesis of C. sinensis at the molecular level, the present study developed an upgraded genomic assembly and annotation by sequencing paired-end and mate-paired libraries. We also performed transcriptome sequence analyses on multiple C. sinensis tissues (sucker, muscle, ovary and testis). Genes encoding molecules involved in responses to stimuli and muscle-related development were abundantly expressed in the oral sucker. Compared with other species, genes encoding molecules that facilitate the recognition and transport of cholesterol were observed in high copy numbers in the genome and were highly expressed in the oral sucker. Genes encoding transporters for fatty acids, glucose, amino acids and oxygen were also highly expressed, along with other molecules involved in metabolizing these substrates. All genes involved in energy metabolism pathways, including the β-oxidation of fatty acids, the citrate cycle, oxidative phosphorylation, and fumarate reduction, were expressed in the adults. Finally, we also provide valuable insights into the mechanism underlying the process of pathogenesis by characterizing the secretome of C. sinensis. The characterization and elaborate analysis of the upgraded genome and the tissue transcriptomes not only form a detailed and fundamental C. sinensis resource but also provide novel insights into the physiology and pathogenesis of C. sinensis. We anticipate that this work will aid the development of innovative strategies for the prevention and control of clonorchiasis. PMID:23382950

  10. Genome-scale transcriptional analyses of first-generation interspecific sunflower hybrids reveals broad regulatory compatibility

    PubMed Central

    2013-01-01

    Background Interspecific hybridization creates individuals harboring diverged genomes. The interaction of these genomes can generate successful evolutionary novelty or disadvantageous genomic conflict. Annual sunflowers Helianthus annuus and H. petiolaris have a rich history of hybridization in natural populations. Although first-generation hybrids generally have low fertility, hybrid swarms that include later generation and fully fertile backcross plants have been identified, as well as at least three independently-originated stable hybrid taxa. We examine patterns of transcript accumulation in the earliest stages of hybridization of these species via analyses of transcriptome sequences from laboratory-derived F1 offspring of an inbred H. annuus cultivar and a wild H. petiolaris accession. Results While nearly 14% of the reference transcriptome showed significant accumulation differences between parental accessions, total F1 transcript levels showed little evidence of dominance, as midparent transcript levels were highly predictive of transcript accumulation in F1 plants. Allelic bias in F1 transcript accumulation was detected in 20% of transcripts containing sufficient polymorphism to distinguish parental alleles; however the magnitude of these biases were generally smaller than differences among parental accessions. Conclusions While analyses of allelic bias suggest that cis regulatory differences between H. annuus and H. petiolaris are common, their effect on transcript levels may be more subtle than trans-acting regulatory differences. Overall, these analyses found little evidence of regulatory incompatibility or dominance interactions between parental genomes within F1 hybrid individuals, although it is unclear whether this is a legacy or an enabler of introgression between species. PMID:23701699

  11. Family-specific scaling laws in bacterial genomes.

    PubMed

    De Lazzari, Eleonora; Grilli, Jacopo; Maslov, Sergei; Cosentino Lagomarsino, Marco

    2017-07-27

    Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Design of a 9K illumina BeadChip for polar bears (Ursus maritimus) from RAD and transcriptome sequencing.

    PubMed

    Malenfant, René M; Coltman, David W; Davis, Corey S

    2015-05-01

    Single-nucleotide polymorphisms (SNPs) offer numerous advantages over anonymous markers such as microsatellites, including improved estimation of population parameters, finer-scale resolution of population structure and more precise genomic dissection of quantitative traits. However, many SNPs are needed to equal the resolution of a single microsatellite, and reliable large-scale genotyping of SNPs remains a challenge in nonmodel species. Here, we document the creation of a 9K Illumina Infinium BeadChip for polar bears (Ursus maritimus), which will be used to investigate: (i) the fine-scale population structure among Canadian polar bears and (ii) the genomic architecture of phenotypic traits in the Western Hudson Bay subpopulation. To this end, we used restriction-site associated DNA (RAD) sequencing from 38 bears across their circumpolar range, as well as blood/fat transcriptome sequencing of 10 individuals from Western Hudson Bay. Six-thousand RAD SNPs and 3000 transcriptomic SNPs were selected for the chip, based primarily on genomic spacing and gene function respectively. Of the 9000 SNPs ordered from Illumina, 8042 were successfully printed, and - after genotyping 1450 polar bears - 5441 of these SNPs were found to be well clustered and polymorphic. Using this array, we show rapid linkage disequilibrium decay among polar bears, we demonstrate that in a subsample of 78 individuals, our SNPs detect known genetic structure more clearly than 24 microsatellites genotyped for the same individuals and that these results are not driven by the SNP ascertainment scheme. Here, we present one of the first large-scale genotyping resources designed for a threatened species. © 2014 John Wiley & Sons Ltd.

  13. RNA-Seq Technology and Its Application in Fish Transcriptomics

    PubMed Central

    Ba, Yi; Zhuang, Qianfeng

    2014-01-01

    Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445

  14. Genome-enabled transcriptomics reveals archaeal populations that drive nitrification in a deep-sea hydrothermal plume.

    PubMed

    Baker, Brett J; Lesniewski, Ryan A; Dick, Gregory J

    2012-12-01

    Ammonia-oxidizing Archaea (AOA) are among the most abundant microorganisms in the oceans and have crucial roles in biogeochemical cycling of nitrogen and carbon. To better understand AOA inhabiting the deep sea, we obtained community genomic and transcriptomic data from ammonium-rich hydrothermal plumes in the Guaymas Basin (GB) and from surrounding deep waters of the Gulf of California. Among the most abundant and active lineages in the sequence data were marine group I (MGI) Archaea related to the cultured autotrophic ammonia-oxidizer, Nitrosopumilus maritimus. Assembly of MGI genomic fragments yielded 2.9 Mb of sequence containing seven 16S rRNA genes (95.4-98.4% similar to N. maritimus), including two near-complete genomes and several lower-abundance variants. Equal copy numbers of MGI 16S rRNA genes and ammonia monooxygenase genes and transcription of ammonia oxidation genes indicates that all of these genotypes actively oxidize ammonia. De novo genomic assembly revealed the functional potential of MGI populations and enhanced interpretation of metatranscriptomic data. Physiological distinction from N. maritimus is evident in the transcription of novel genes, including genes for urea utilization, suggesting an alternative source of ammonia. We were also able to determine which genotypes are most active in the plume. Transcripts involved in nitrification were more prominent in the plume and were among the most abundant transcripts in the community. These unique data sets reveal populations of deep-sea AOA thriving in the ammonium-rich GB that are related to surface types, but with key genomic and physiological differences.

  15. Genome-wide methylomic and transcriptomic analyses identify subtype-specific epigenetic signatures commonly dysregulated in glioma stem cells and glioblastoma.

    PubMed

    Pangeni, Rajendra P; Zhang, Zhou; Alvarez, Angel A; Wan, Xuechao; Sastry, Namratha; Lu, Songjian; Shi, Taiping; Huang, Tianzhi; Lei, Charles X; James, C David; Kessler, John A; Brennan, Cameron W; Nakano, Ichiro; Lu, Xinghua; Hu, Bo; Zhang, Wei; Cheng, Shi-Yuan

    2018-06-21

    Glioma stem cells (GSCs), a subpopulation of tumor cells, contribute to tumor heterogeneity and therapy resistance. Gene expression profiling classified glioblastoma (GBM) and GSCs into four transcriptomically-defined subtypes. Here, we determined the DNA methylation signatures in transcriptomically pre-classified GSC and GBM bulk tumors subtypes. We hypothesized that these DNA methylation signatures correlate with gene expression and are uniquely associated either with only GSCs or only GBM bulk tumors. Additional methylation signatures may be commonly associated with both GSCs and GBM bulk tumors, i.e., common to non-stem-like and stem-like tumor cell populations and correlating with the clinical prognosis of glioma patients. We analyzed Illumina 450K methylation array and expression data from a panel of 23 patient-derived GSCs. We referenced these results with The Cancer Genome Atlas (TCGA) GBM datasets to generate methylomic and transcriptomic signatures for GSCs and GBM bulk tumors of each transcriptomically pre-defined tumor subtype. Survival analyses were carried out for these signature genes using publicly available datasets, including from TCGA. We report that DNA methylation signatures in proneural and mesenchymal tumor subtypes are either unique to GSCs, unique to GBM bulk tumors, or common to both. Further, dysregulated DNA methylation correlates with gene expression and clinical prognoses. Additionally, many previously identified transcriptionally-regulated markers are also dysregulated due to DNA methylation. The subtype-specific DNA methylation signatures described in this study could be useful for refining GBM sub-classification, improving prognostic accuracy, and making therapeutic decisions.

  16. Effects of pseudorabies virus infection on the tracheobronchial lymph node transcriptome

    USDA-ARS?s Scientific Manuscript database

    This study represents the first swine transcriptome hiveplots created from GSEA data and provides a novel insight into the global transcriptome changes spanning the swine genome. RNA isolated from draining tracheobronchial lymph nodes (TBLN) from 5-week old pigs clinically infected with a feral iso...

  17. Selection and evaluation of novel reference genes for quantitative reverse transcription PCR (qRT-PCR) based on genome and transcriptome data in Brassica napus L.

    PubMed

    Yang, Hongli; Liu, Jing; Huang, Shunmou; Guo, Tingting; Deng, Linbin; Hua, Wei

    2014-03-15

    Selection of reference genes in Brassica napus, a tetraploid (4×) species, is a very difficult task without information on genome and transcriptome. By now, only several traditional reference genes which show significant expression differentiation under different conditions are used in B. napus. In the present study, based on genome and transcriptome data of the rapeseed Zhongshuang-11 cultivar, 14 candidate reference genes were screened for investigation in different tissues, cultivars, and treated conditions of B. napus. These genes were as follows: ELF5, ENTH, F-BOX7, F-BOX2, FYPP1, GDI1, GYF, MCP2d, OTP80, PPR, SPOC, Unknown1, Unknown2 and UBA. Among them, excluding GYF and FYPP1, another 12 genes, were identified to perform better than traditional reference genes ACTIN7 and GAPDH. To further validate the accuracy of the newly developed reference genes in normalization, expression levels of BnCAT1 (B. napus catalase 1) in different rapeseed tissues and seedlings under stress conditions were normalized by the three most stable reference genes PPR, GDI1, and ENTH and little difference existed in normalization results. To the best of our knowledge, this is the first time B. napus reference genes have been provided with the help of complete genome and transcriptome information. The new reference genes provided in this study are more accurate than previously reported reference genes in quantifying expression levels of B. napus genes. Crown Copyright © 2014. Published by Elsevier B.V. All rights reserved.

  18. Genome and Transcriptome of Clostridium phytofermentans, Catalyst for the Direct Conversion of Plant Feedstocks to Fuels

    DOE PAGES

    Petit, Elsa; Coppi, Maddalena V.; Hayes, James C.; ...

    2015-06-02

    Clostridium phytofermentans was isolated from forest soil and is distinguished by its capacity to directly ferment plant cell wall polysaccharides into ethanol as the primary product, suggesting that it possesses unusual catabolic pathways. The objective of our present study was to understand the molecular mechanisms of biomass conversion to ethanol in a single organism, Clostridium phytofermentans, by analyzing its complete genome and transcriptome during growth on plant carbohydrates. The saccharolytic versatility of C. phytofermentans is reflected in a diversity of genes encoding ATP-binding cassette sugar transporters and glycoside hydrolases, many of which may have been acquired through horizontal gene transfer.more » These genes are frequently organized as operons that may be controlled individually by the many transcriptional regulators identified in the genome. Preferential ethanol production may be due to high levels of expression of multiple ethanol dehydrogenases and additional pathways maximizing ethanol yield. The genome also encodes three different proteinaceous bacterial microcompartments with the capacity to compartmentalize pathways that divert fermentation intermediates to various products. Lastly, these characteristics make C. phytofermentans an attractive resource for improving the efficiency and speed of biomass conversion to biofuels.« less

  19. Genome and Transcriptome of Clostridium phytofermentans, Catalyst for the Direct Conversion of Plant Feedstocks to Fuels.

    PubMed

    Petit, Elsa; Coppi, Maddalena V; Hayes, James C; Tolonen, Andrew C; Warnick, Thomas; Latouf, William G; Amisano, Danielle; Biddle, Amy; Mukherjee, Supratim; Ivanova, Natalia; Lykidis, Athanassios; Land, Miriam; Hauser, Loren; Kyrpides, Nikos; Henrissat, Bernard; Lau, Joanne; Schnell, Danny J; Church, George M; Leschine, Susan B; Blanchard, Jeffrey L

    2015-01-01

    Clostridium phytofermentans was isolated from forest soil and is distinguished by its capacity to directly ferment plant cell wall polysaccharides into ethanol as the primary product, suggesting that it possesses unusual catabolic pathways. The objective of the present study was to understand the molecular mechanisms of biomass conversion to ethanol in a single organism, Clostridium phytofermentans, by analyzing its complete genome and transcriptome during growth on plant carbohydrates. The saccharolytic versatility of C. phytofermentans is reflected in a diversity of genes encoding ATP-binding cassette sugar transporters and glycoside hydrolases, many of which may have been acquired through horizontal gene transfer. These genes are frequently organized as operons that may be controlled individually by the many transcriptional regulators identified in the genome. Preferential ethanol production may be due to high levels of expression of multiple ethanol dehydrogenases and additional pathways maximizing ethanol yield. The genome also encodes three different proteinaceous bacterial microcompartments with the capacity to compartmentalize pathways that divert fermentation intermediates to various products. These characteristics make C. phytofermentans an attractive resource for improving the efficiency and speed of biomass conversion to biofuels.

  20. Transcriptome characterization for genome annotation and functional genomics in Theobroma cacao

    USDA-ARS?s Scientific Manuscript database

    Evidence from leaf transcriptome sequencing using two technology platforms, in combination with protein homology and trained ab initio predictions, previously enabled us to build 35,000 gene models in T. cacao (www.cacaogenomedb.org). Here we review the contribution of each data type to cacao gene a...

  1. Reconstructing genome-wide regulatory network of E. coli using transcriptome data and predicted transcription factor activities

    PubMed Central

    2011-01-01

    Background Gene regulatory networks play essential roles in living organisms to control growth, keep internal metabolism running and respond to external environmental changes. Understanding the connections and the activity levels of regulators is important for the research of gene regulatory networks. While relevance score based algorithms that reconstruct gene regulatory networks from transcriptome data can infer genome-wide gene regulatory networks, they are unfortunately prone to false positive results. Transcription factor activities (TFAs) quantitatively reflect the ability of the transcription factor to regulate target genes. However, classic relevance score based gene regulatory network reconstruction algorithms use models do not include the TFA layer, thus missing a key regulatory element. Results This work integrates TFA prediction algorithms with relevance score based network reconstruction algorithms to reconstruct gene regulatory networks with improved accuracy over classic relevance score based algorithms. This method is called Gene expression and Transcription factor activity based Relevance Network (GTRNetwork). Different combinations of TFA prediction algorithms and relevance score functions have been applied to find the most efficient combination. When the integrated GTRNetwork method was applied to E. coli data, the reconstructed genome-wide gene regulatory network predicted 381 new regulatory links. This reconstructed gene regulatory network including the predicted new regulatory links show promising biological significances. Many of the new links are verified by known TF binding site information, and many other links can be verified from the literature and databases such as EcoCyc. The reconstructed gene regulatory network is applied to a recent transcriptome analysis of E. coli during isobutanol stress. In addition to the 16 significantly changed TFAs detected in the original paper, another 7 significantly changed TFAs have been detected by

  2. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut

    PubMed Central

    Armero, Alix; Bocs, Stéphanie; This, Dominique

    2017-01-01

    The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/). PMID:28334050

  3. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

    PubMed

    Armero, Alix; Baudouin, Luc; Bocs, Stéphanie; This, Dominique

    2017-01-01

    The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).

  4. Separating homeologs by phasing in the tetraploid wheat transcriptome

    PubMed Central

    2013-01-01

    Background The high level of identity among duplicated homoeologous genomes in tetraploid pasta wheat presents substantial challenges for de novo transcriptome assembly. To solve this problem, we develop a specialized bioinformatics workflow that optimizes transcriptome assembly and separation of merged homoeologs. To evaluate our strategy, we sequence and assemble the transcriptome of one of the diploid ancestors of pasta wheat, and compare both assemblies with a benchmark set of 13,472 full-length, non-redundant bread wheat cDNAs. Results A total of 489 million 100 bp paired-end reads from tetraploid wheat assemble in 140,118 contigs, including 96% of the benchmark cDNAs. We used a comparative genomics approach to annotate 66,633 open reading frames. The multiple k-mer assembly strategy increases the proportion of cDNAs assembled full-length in a single contig by 22% relative to the best single k-mer size. Homoeologs are separated using a post-assembly pipeline that includes polymorphism identification, phasing of SNPs, read sorting, and re-assembly of phased reads. Using a reference set of genes, we determine that 98.7% of SNPs analyzed are correctly separated by phasing. Conclusions Our study shows that de novo transcriptome assembly of tetraploid wheat benefit from multiple k-mer assembly strategies more than diploid wheat. Our results also demonstrate that phasing approaches originally designed for heterozygous diploid organisms can be used to separate the close homoeologous genomes of tetraploid wheat. The predicted tetraploid wheat proteome and gene models provide a valuable tool for the wheat research community and for those interested in comparative genomic studies. PMID:23800085

  5. Genome scale engineering techniques for metabolic engineering.

    PubMed

    Liu, Rongming; Bassalo, Marcelo C; Zeitoun, Ramsey I; Gill, Ryan T

    2015-11-01

    Metabolic engineering has expanded from a focus on designs requiring a small number of genetic modifications to increasingly complex designs driven by advances in genome-scale engineering technologies. Metabolic engineering has been generally defined by the use of iterative cycles of rational genome modifications, strain analysis and characterization, and a synthesis step that fuels additional hypothesis generation. This cycle mirrors the Design-Build-Test-Learn cycle followed throughout various engineering fields that has recently become a defining aspect of synthetic biology. This review will attempt to summarize recent genome-scale design, build, test, and learn technologies and relate their use to a range of metabolic engineering applications. Copyright © 2015 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.

  6. Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction.

    PubMed

    Foth, Bernardo J; Tsai, Isheng J; Reid, Adam J; Bancroft, Allison J; Nichol, Sarah; Tracey, Alan; Holroyd, Nancy; Cotton, James A; Stanley, Eleanor J; Zarowiecki, Magdalena; Liu, Jimmy Z; Huckvale, Thomas; Cooper, Philip J; Grencis, Richard K; Berriman, Matthew

    2014-07-01

    Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. We report here the whole-genome sequences of the human-infective Trichuris trichiura and the mouse laboratory model Trichuris muris. On the basis of whole-transcriptome analyses, we identify many genes that are expressed in a sex- or life stage-specific manner and characterize the transcriptional landscape of a morphological region with unique biological adaptations, namely, bacillary band and stichosome, found only in whipworms and related parasites. Using RNA sequencing data from whipworm-infected mice, we describe the regulated T helper 1 (TH1)-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identified numerous new potential drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection.

  7. Whipworm genome and dual-species transcriptome analyses provide molecular insights into an intimate host-parasite interaction

    PubMed Central

    Nichol, Sarah; Tracey, Alan; Holroyd, Nancy; Cotton, James A.; Stanley, Eleanor J.; Zarowiecki, Magdalena; Liu, Jimmy Z.; Huckvale, Thomas; Cooper, Philip J.; Grencis, Richard K.; Berriman, Matthew

    2014-01-01

    Whipworms are common soil-transmitted helminths that cause debilitating chronic infections in man. These nematodes are only distantly related to Caenorhabditis elegans and have evolved to occupy an unusual niche, tunneling through epithelial cells of the large intestine. Here we present the genome sequences of the human-infective Trichuris trichiura and the murine laboratory model T. muris. Based on whole transcriptome analyses we identify many genes that are expressed in a gender- or life stage-specific manner and characterise the transcriptional landscape of a morphological region with unique biological adaptations, namely bacillary band and stichosome, found only in whipworms and related parasites. Using RNAseq data from whipworm-infected mice we describe the regulated Th1-like immune response of the chronically infected cecum in unprecedented detail. In silico screening identifies numerous potential new drug targets against trichuriasis. Together, these genomes and associated functional data elucidate key aspects of the molecular host-parasite interactions that define chronic whipworm infection. PMID:24929830

  8. Genomic and transcriptomic analysis of NDM-1 Klebsiella pneumoniae in spaceflight reveal mechanisms underlying environmental adaptability

    PubMed Central

    Li, Jia; Liu, Fei; Wang, Qi; Ge, Pupu; Woo, Patrick C. Y.; Yan, Jinghua; Zhao, Yanlin; Gao, George F.; Liu, Cui Hua; Liu, Changting

    2014-01-01

    The emergence and rapid spread of New Delhi Metallo-beta-lactamase-1 (NDM-1)-producing Klebsiella pneumoniae strains has caused a great concern worldwide. To better understand the mechanisms underlying environmental adaptation of those highly drug-resistant K. pneumoniae strains, we took advantage of the China's Shenzhou 10 spacecraft mission to conduct comparative genomic and transcriptomic analysis of a NDM-1 K. pneumoniae strain (ATCC BAA-2146) being cultivated under different conditions. The samples were recovered from semisolid medium placed on the ground (D strain), in simulated space condition (M strain), or in Shenzhou 10 spacecraft (T strain) for analysis. Our data revealed multiple variations underlying pathogen adaptation into different environments in terms of changes in morphology, H2O2 tolerance and biofilm formation ability, genomic stability and regulation of metabolic pathways. Additionally, we found a few non-coding RNAs to be differentially regulated. The results are helpful for better understanding the adaptive mechanisms of drug-resistant bacterial pathogens. PMID:25163721

  9. The Embryonic Transcriptome of the Red-Eared Slider Turtle (Trachemys scripta)

    PubMed Central

    Kaplinsky, Nicholas J.; Gilbert, Scott F.; Cebra-Thomas, Judith; Lilleväli, Kersti; Saare, Merly; Chang, Eric Y.; Edelman, Hannah E.; Frick, Melissa A.; Guan, Yin; Hammond, Rebecca M.; Hampilos, Nicholas H.; Opoku, David S. B.; Sariahmed, Karim; Sherman, Eric A.; Watson, Ray

    2013-01-01

    The bony shell of the turtle is an evolutionary novelty not found in any other group of animals, however, research into its formation has suggested that it has evolved through modification of conserved developmental mechanisms. Although these mechanisms have been extensively characterized in model organisms, the tools for characterizing them in non-model organisms such as turtles have been limited by a lack of genomic resources. We have used a next generation sequencing approach to generate and assemble a transcriptome from stage 14 and 17 Trachemys scripta embryos, stages during which important events in shell development are known to take place. The transcriptome consists of 231,876 sequences with an N50 of 1,166 bp. GO terms and EC codes were assigned to the 61,643 unique predicted proteins identified in the transcriptome sequences. All major GO categories and metabolic pathways are represented in the transcriptome. Transcriptome sequences were used to amplify several cDNA fragments designed for use as RNA in situ probes. One of these, BMP5, was hybridized to a T. scripta embryo and exhibits both conserved and novel expression patterns. The transcriptome sequences should be of broad use for understanding the evolution and development of the turtle shell and for annotating any future T. scripta genome sequences. PMID:23840449

  10. Analysis of the Legionella longbeachae Genome and Transcriptome Uncovers Unique Strategies to Cause Legionnaires' Disease

    PubMed Central

    Rusniok, Christophe; Lomma, Mariella; Dervins-Ravault, Delphine; Newton, Hayley J.; Sansom, Fiona M.; Jarraud, Sophie; Zidane, Nora; Ma, Laurence; Bouchier, Christiane; Etienne, Jerôme; Hartland, Elizabeth L.; Buchrieser, Carmen

    2010-01-01

    Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg) 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these two Legionella

  11. The Transcriptomics of Secondary Growth and Wood Formation in Conifers

    PubMed Central

    Carvalho, Ana; Paiva, Jorge; Louzada, José; Lima-Brito, José

    2013-01-01

    In the last years, forestry scientists have adapted genomics and next-generation sequencing (NGS) technologies to the search for candidate genes related to the transcriptomics of secondary growth and wood formation in several tree species. Gymnosperms, in particular, the conifers, are ecologically and economically important, namely, for the production of wood and other forestry end products. Until very recently, no whole genome sequencing of a conifer genome was available. Due to the gradual improvement of the NGS technologies and inherent bioinformatics tools, two draft assemblies of the whole genomes sequence of Picea abies and Picea glauca arose in the current year. These draft genome assemblies will bring new insights about the structure, content, and evolution of the conifer genomes. Furthermore, new directions in the forestry, breeding and research of conifers will be discussed in the following. The identification of genes associated with the xylem transcriptome and the knowledge of their regulatory mechanisms will provide less time-consuming breeding cycles and a high accuracy for the selection of traits related to wood production and quality. PMID:24288610

  12. Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus.

    PubMed

    Carroll, Ronan K; Weiss, Andy; Broach, William H; Wiemels, Richard E; Mogen, Austin B; Rice, Kelly C; Shaw, Lindsey N

    2016-02-09

    In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. Despite a large number of studies identifying regulatory or small RNA (sRNA) genes in Staphylococcus aureus, their annotation is notably lacking in available genome files. In addition to this, there has been a considerable lack of cross-referencing in the wealth of studies identifying these elements, often leading to the same sRNA being identified multiple times and bearing multiple names. In this work

  13. The Human Blood Metabolome-Transcriptome Interface

    PubMed Central

    Schramm, Katharina; Adamski, Jerzy; Gieger, Christian; Herder, Christian; Carstensen, Maren; Peters, Annette; Rathmann, Wolfgang; Roden, Michael; Strauch, Konstantin; Suhre, Karsten; Kastenmüller, Gabi; Prokisch, Holger; Theis, Fabian J.

    2015-01-01

    Biological systems consist of multiple organizational levels all densely interacting with each other to ensure function and flexibility of the system. Simultaneous analysis of cross-sectional multi-omics data from large population studies is a powerful tool to comprehensively characterize the underlying molecular mechanisms on a physiological scale. In this study, we systematically analyzed the relationship between fasting serum metabolomics and whole blood transcriptomics data from 712 individuals of the German KORA F4 cohort. Correlation-based analysis identified 1,109 significant associations between 522 transcripts and 114 metabolites summarized in an integrated network, the ‘human blood metabolome-transcriptome interface’ (BMTI). Bidirectional causality analysis using Mendelian randomization did not yield any statistically significant causal associations between transcripts and metabolites. A knowledge-based interpretation and integration with a genome-scale human metabolic reconstruction revealed systematic signatures of signaling, transport and metabolic processes, i.e. metabolic reactions mainly belonging to lipid, energy and amino acid metabolism. Moreover, the construction of a network based on functional categories illustrated the cross-talk between the biological layers at a pathway level. Using a transcription factor binding site enrichment analysis, this pathway cross-talk was further confirmed at a regulatory level. Finally, we demonstrated how the constructed networks can be used to gain novel insights into molecular mechanisms associated to intermediate clinical traits. Overall, our results demonstrate the utility of a multi-omics integrative approach to understand the molecular mechanisms underlying both normal physiology and disease. PMID:26086077

  14. C-RAF function at the genome-wide transcriptome level: A systematic view.

    PubMed

    Huang, Ying; Zhang, Xin-Yu; An, Su; Yang, Yang; Liu, Ying; Hao, Qian; Guo, Xiao-Xi; Xu, Tian-Rui

    2018-05-20

    C-RAF was the first member of the RAF kinase family to be discovered. Since its discovery, C-RAF has been found to regulate many fundamental cell processes, such as cell proliferation, cell death, and metabolism. However, the majority of these functions are achieved through interactions with different proteins; the genes regulated by C-RAF in its active or inactive state remain unclear. In the work, we used RNA-seq analysis to study the global transcriptomes of C-RAF bearing or C-RAF knockout cells in quiescent or EGF activated states. We identified 3353 genes that are promoted or suppressed by C-RAF. Gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses revealed that these genes are involved in drug addiction, cardiomyopathy, autoimmunity, and regulation of cell metabolism. Our results provide a panoramic view of C-RAF function, including known and novel functions, and have revealed potential targets for elucidating the role of C-RAF. Copyright © 2018 Elsevier B.V. All rights reserved.

  15. The genomic and transcriptomic basis of the potential of Lactobacillus plantarum A6 to improve the nutritional quality of a cereal based fermented food.

    PubMed

    Turpin, Williams; Weiman, Marion; Guyot, Jean-Pierre; Lajus, Aurélie; Cruveiller, Stéphane; Humblot, Christèle

    2018-02-02

    The objective of this work was to investigate the nutritional potential of Lactobacillus plantarum A6 in a food matrix using next generation sequencing. To this end, we characterized the genome of the A6 strain for a complete overview of its potential. We then compared its transcriptome when grown in a food matrix made from pearl millet to and its transcriptome when cultivated in a laboratory medium. Genomic comparison of the strain L. plantarum A6 with the strains WCFS1, ST-III, JDM1 and ATCC14917 led to the identification of five regions of genomic plasticity. More specifically, 362 coding sequences, mostly annotated as coding for proteins of unknown functions, were specific to L. plantarum A6. A total of 1201 genes were significantly differentially expressed in laboratory medium and food matrix. Among them, 821 genes were up-regulated in the food matrix compared to the laboratory medium, representing 23% of whole genomic objects. In the laboratory medium, the expression of 380 genes, representing 11% of the all genomic objects was at least double than in the food matrix. Genes encoding important functions for the nutritional quality of the food were identified. Considering its efficiency as an amylolytic strain, we investigated all genes involved in carbohydrate metabolism, paying particular attention to starch metabolism. An extracellular alpha amylase, a neopullulanase and maltodextrin transporters were identified, all of which were highly expressed in the food matrix. In addition, genes involved in alpha-galactoside metabolism were identified but only two of them were induced in food matrix than in laboratory medium. This may be because alpha galactosides were already eliminated during soaking. Different biosynthetic pathways involved in the synthesis of vitamin B (folate, riboflavin, and cobalamin) were identified. They allowed the identification of a potential of vitamin synthesis, which should be confirmed through biochemical analysis in further work

  16. Comparison between the Amount of Environmental Change and the Amount of Transcriptome Change

    PubMed Central

    Ogata, Norichika; Kozaki, Toshinori; Yokoyama, Takeshi; Hata, Tamako; Iwabuchi, Kikuo

    2015-01-01

    Cells must coordinate adjustments in genome expression to accommodate changes in their environment. We hypothesized that the amount of transcriptome change is proportional to the amount of environmental change. To capture the effects of environmental changes on the transcriptome, we compared transcriptome diversities (defined as the Shannon entropy of frequency distribution) of silkworm fat-body tissues cultured with several concentrations of phenobarbital. Although there was no proportional relationship, we did identify a drug concentration “tipping point” between 0.25 and 1.0 mM. Cells cultured in media containing lower drug concentrations than the tipping point showed uniformly high transcriptome diversities, while those cultured at higher drug concentrations than the tipping point showed uniformly low transcriptome diversities. The plasticity of transcriptome diversity was corroborated by cultivations of fat bodies in MGM-450 insect medium without phenobarbital and in 0.25 mM phenobarbital-supplemented MGM-450 insect medium after previous cultivation (cultivation for 80 hours in MGM-450 insect medium without phenobarbital, followed by cultivation for 10 hours in 1.0 mM phenobarbital-supplemented MGM-450 insect medium). Interestingly, the transcriptome diversities of cells cultured in media containing 0.25 mM phenobarbital after previous cultivation (cultivation for 80 hours in MGM-450 insect medium without phenobarbital, followed by cultivation for 10 hours in 1.0 mM phenobarbital-supplemented MGM-450 insect medium) were different from cells cultured in media containing 0.25 mM phenobarbital after previous cultivation (cultivation for 80 hours in MGM-450 insect medium without phenobarbital). This hysteretic phenomenon of transcriptome diversities indicates multi-stability of the genome expression system. Cellular memories were recorded in genome expression networks as in DNA/histone modifications. PMID:26657512

  17. GENOMICS AND ENVIRONMENTAL RESEARCH

    EPA Science Inventory

    The impact of recently developed and emerging genomics technologies on environmental sciences has significant implications for human and ecological risk assessment issues. The linkage of data generated from genomics, transcriptomics, proteomics, metabalomics, and ecology can be ...

  18. De Novo Assembly and Annotation of the Transcriptome of the Agricultural Weed Ipomoea purpurea Uncovers Gene Expression Changes Associated with Herbicide Resistance

    PubMed Central

    Leslie, Trent; Baucom, Regina S.

    2014-01-01

    Human-mediated selection can lead to rapid evolution in very short time scales, and the evolution of herbicide resistance in agricultural weeds is an excellent example of this phenomenon. The common morning glory, Ipomoea purpurea, is resistant to the herbicide glyphosate, but genetic investigations of this trait have been hampered by the lack of genomic resources for this species. Here, we present the annotated transcriptome of the common morning glory, Ipomoea purpurea, along with an examination of whole genome expression profiling to assess potential gene expression differences between three artificially selected herbicide resistant lines and three susceptible lines. The assembled Ipomoea transcriptome reported in this work contains 65,459 assembled transcripts, ~28,000 of which were functionally annotated by assignment to Gene Ontology categories. Our RNA-seq survey using this reference transcriptome identified 19 differentially expressed genes associated with resistance—one of which, a cytochrome P450, belongs to a large plant family of genes involved in xenobiotic detoxification. The differentially expressed genes also broadly implicated receptor-like kinases, which were down-regulated in the resistant lines, and other growth and defense genes, which were up-regulated in resistant lines. Interestingly, the target of glyphosate—EPSP synthase—was not overexpressed in the resistant Ipomoea lines as in other glyphosate resistant weeds. Overall, this work identifies potential candidate resistance loci for future investigations and dramatically increases genomic resources for this species. The assembled transcriptome presented herein will also provide a valuable resource to the Ipomoea community, as well as to those interested in utilizing the close relationship between the Convolvulaceae and the Solanaceae for phylogenetic and comparative genomics examinations. PMID:25155274

  19. De novo assembly and annotation of the transcriptome of the agricultural weed Ipomoea purpurea uncovers gene expression changes associated with herbicide resistance.

    PubMed

    Leslie, Trent; Baucom, Regina S

    2014-08-25

    Human-mediated selection can lead to rapid evolution in very short time scales, and the evolution of herbicide resistance in agricultural weeds is an excellent example of this phenomenon. The common morning glory, Ipomoea purpurea, is resistant to the herbicide glyphosate, but genetic investigations of this trait have been hampered by the lack of genomic resources for this species. Here, we present the annotated transcriptome of the common morning glory, Ipomoea purpurea, along with an examination of whole genome expression profiling to assess potential gene expression differences between three artificially selected herbicide resistant lines and three susceptible lines. The assembled Ipomoea transcriptome reported in this work contains 65,459 assembled transcripts, ~28,000 of which were functionally annotated by assignment to Gene Ontology categories. Our RNA-seq survey using this reference transcriptome identified 19 differentially expressed genes associated with resistance-one of which, a cytochrome P450, belongs to a large plant family of genes involved in xenobiotic detoxification. The differentially expressed genes also broadly implicated receptor-like kinases, which were down-regulated in the resistant lines, and other growth and defense genes, which were up-regulated in resistant lines. Interestingly, the target of glyphosate-EPSP synthase-was not overexpressed in the resistant Ipomoea lines as in other glyphosate resistant weeds. Overall, this work identifies potential candidate resistance loci for future investigations and dramatically increases genomic resources for this species. The assembled transcriptome presented herein will also provide a valuable resource to the Ipomoea community, as well as to those interested in utilizing the close relationship between the Convolvulaceae and the Solanaceae for phylogenetic and comparative genomics examinations. Copyright © 2014 Leslie and Baucom.

  20. Genomic, Transcriptomic and Metabolomic Studies of Two Well-Characterized, Laboratory-Derived Vancomycin-Intermediate Staphylococcus aureus Strains Derived from the Same Parent Strain

    PubMed Central

    Hattangady, Dipti S.; Singh, Atul K.; Muthaiyan, Arun; Jayaswal, Radheshyam K.; Gustafson, John E.; Ulanov, Alexander V.; Li, Zhong; Wilkinson, Brian J.; Pfeltz, Richard F.

    2015-01-01

    Complete genome comparisons, transcriptomic and metabolomic studies were performed on two laboratory-selected, well-characterized vancomycin-intermediate Staphylococcus aureus (VISA) derived from the same parent MRSA that have changes in cell wall composition and decreased autolysis. A variety of mutations were found in the VISA, with more in strain 13136p−m+V20 (vancomycin MIC = 16 µg/mL) than strain 13136p−m+V5 (MIC = 8 µg/mL). Most of the mutations have not previously been associated with the VISA phenotype; some were associated with cell wall metabolism and many with stress responses, notably relating to DNA damage. The genomes and transcriptomes of the two VISA support the importance of gene expression regulation to the VISA phenotype. Similarities in overall transcriptomic and metabolomic data indicated that the VISA physiologic state includes elements of the stringent response, such as downregulation of protein and nucleotide synthesis, the pentose phosphate pathway and nutrient transport systems. Gene expression for secreted virulence determinants was generally downregulated, but was more variable for surface-associated virulence determinants, although capsule formation was clearly inhibited. The importance of activated stress response elements could be seen across all three analyses, as in the accumulation of osmoprotectant metabolites such as proline and glutamate. Concentrations of potential cell wall precursor amino acids and glucosamine were increased in the VISA strains. Polyamines were decreased in the VISA, which may facilitate the accrual of mutations. Overall, the studies confirm the wide variability in mutations and gene expression patterns that can lead to the VISA phenotype. PMID:27025616

  1. Genome and Transcriptome sequence of Finger millet (Eleusine coracana (L.) Gaertn.) provides insights into drought tolerance and nutraceutical properties.

    PubMed

    Hittalmani, Shailaja; Mahesh, H B; Shirke, Meghana Deepak; Biradar, Hanamareddy; Uday, Govindareddy; Aruna, Y R; Lohithaswa, H C; Mohanrao, A

    2017-06-15

    Finger millet (Eleusine coracana (L.) Gaertn.) is an important staple food crop widely grown in Africa and South Asia. Among the millets, finger millet has high amount of calcium, methionine, tryptophan, fiber, and sulphur containing amino acids. In addition, it has C4 photosynthetic carbon assimilation mechanism, which helps to utilize water and nitrogen efficiently under hot and arid conditions without severely affecting yield. Therefore, development and utilization of genomic resources for genetic improvement of this crop is immensely useful. Experimental results from whole genome sequencing and assembling process of ML-365 finger millet cultivar yielded 1196 Mb covering approximately 82% of total estimated genome size. Genome analysis showed the presence of 85,243 genes and one half of the genome is repetitive in nature. The finger millet genome was found to have higher colinearity with foxtail millet and rice as compared to other Poaceae species. Mining of simple sequence repeats (SSRs) yielded abundance of SSRs within the finger millet genome. Functional annotation and mining of transcription factors revealed finger millet genome harbors large number of drought tolerance related genes. Transcriptome analysis of low moisture stress and non-stress samples revealed the identification of several drought-induced candidate genes, which could be used in drought tolerance breeding. This genome sequencing effort will strengthen plant breeders for allele discovery, genetic mapping, and identification of candidate genes for agronomically important traits. Availability of genomic resources of finger millet will enhance the novel breeding possibilities to address potential challenges of finger millet improvement.

  2. Fish-T1K (Transcriptomes of 1,000 Fishes) Project: large-scale transcriptome data for fish evolution studies.

    PubMed

    Sun, Ying; Huang, Yu; Li, Xiaofeng; Baldwin, Carole C; Zhou, Zhuocheng; Yan, Zhixiang; Crandall, Keith A; Zhang, Yong; Zhao, Xiaomeng; Wang, Min; Wong, Alex; Fang, Chao; Zhang, Xinhui; Huang, Hai; Lopez, Jose V; Kilfoyle, Kirk; Zhang, Yong; Ortí, Guillermo; Venkatesh, Byrappa; Shi, Qiong

    2016-01-01

    Ray-finned fishes (Actinopterygii) represent more than 50 % of extant vertebrates and are of great evolutionary, ecologic and economic significance, but they are relatively underrepresented in 'omics studies. Increased availability of transcriptome data for these species will allow researchers to better understand changes in gene expression, and to carry out functional analyses. An international project known as the "Transcriptomes of 1,000 Fishes" (Fish-T1K) project has been established to generate RNA-seq transcriptome sequences for 1,000 diverse species of ray-finned fishes. The first phase of this project has produced transcriptomes from more than 180 ray-finned fishes, representing 142 species and covering 51 orders and 109 families. Here we provide an overview of the goals of this project and the work done so far.

  3. Sugar Metabolism of the First Thermophilic Planctomycete Thermogutta terrifontis: Comparative Genomic and Transcriptomic Approaches

    PubMed Central

    Elcheninov, Alexander G.; Menzel, Peter; Gudbergsdottir, Soley R.; Slesarev, Alexei I.; Kadnikov, Vitaly V.; Krogh, Anders; Bonch-Osmolovskaya, Elizaveta A.; Peng, Xu; Kublanov, Ilya V.

    2017-01-01

    Xanthan gum, a complex polysaccharide comprising glucose, mannose and glucuronic acid residues, is involved in numerous biotechnological applications in cosmetics, agriculture, pharmaceuticals, food and petroleum industries. Additionally, its oligosaccharides were shown to possess antimicrobial, antioxidant, and few other properties. Yet, despite its extensive usage, little is known about xanthan gum degradation pathways and mechanisms. Thermogutta terrifontis, isolated from a sample of microbial mat developed in a terrestrial hot spring of Kunashir island (Far-East of Russia), was described as the first thermophilic representative of the Planctomycetes phylum. It grows well on xanthan gum either at aerobic or anaerobic conditions. Genomic analysis unraveled the pathways of oligo- and polysaccharides utilization, as well as the mechanisms of aerobic and anaerobic respiration. The combination of genomic and transcriptomic approaches suggested a novel xanthan gum degradation pathway which involves novel glycosidase(s) of DUF1080 family, hydrolyzing xanthan gum backbone beta-glucosidic linkages and beta-mannosidases instead of xanthan lyases, catalyzing cleavage of terminal beta-mannosidic linkages. Surprisingly, the genes coding DUF1080 proteins were abundant in T. terrifontis and in many other Planctomycetes genomes, which, together with our observation that xanthan gum being a selective substrate for many planctomycetes, suggest crucial role of DUF1080 in xanthan gum degradation. Our findings shed light on the metabolism of the first thermophilic planctomycete, capable to degrade a number of polysaccharides, either aerobically or anaerobically, including the biotechnologically important bacterial polysaccharide xanthan gum. PMID:29163426

  4. A genome-wide transcriptome map of pistachio (Pistacia vera L.) provides novel insights into salinity-related genes and marker discovery.

    PubMed

    Moazzzam Jazi, Maryam; Seyedi, Seyed Mahdi; Ebrahimie, Esmaeil; Ebrahimi, Mansour; De Moro, Gianluca; Botanga, Christopher

    2017-08-17

    Pistachio (Pistacia vera L.) is one of the most important commercial nut crops worldwide. It is a salt-tolerant and long-lived tree, with the largest cultivation area in Iran. Climate change and subsequent increased soil salt content have adversely affected the pistachio yield in recent years. However, the lack of genomic/global transcriptomic sequences on P. vera impedes comprehensive researches at the molecular level. Hence, whole transcriptome sequencing is required to gain insight into functional genes and pathways in response to salt stress. RNA sequencing of a pooled sample representing 24 different tissues of two pistachio cultivars with contrasting salinity tolerance under control and salt treatment by Illumina Hiseq 2000 platform resulted in 368,953,262 clean 100 bp paired-ends reads (90 Gb). Following creating several assemblies and assessing their quality from multiple perspectives, we found that using the annotation-based metrics together with the length-based parameters allows an improved assessment of the transcriptome assembly quality, compared to the solely use of the length-based parameters. The generated assembly by Trinity was adopted for functional annotation and subsequent analyses. In total, 29,119 contigs annotated against all of five public databases, including NR, UniProt, TAIR10, KOG and InterProScan. Among 279 KEGG pathways supported by our assembly, we further examined the pathways involved in the plant hormone biosynthesis and signaling as well as those to be contributed to secondary metabolite biosynthesis due to their importance under salinity stress. In total, 11,337 SSRs were also identified, which the most abundant being dinucleotide repeats. Besides, 13,097 transcripts as candidate stress-responsive genes were identified. Expression of some of these genes experimentally validated through quantitative real-time PCR (qRT-PCR) that further confirmed the accuracy of the assembly. From this analysis, the contrasting expression pattern

  5. Multi-scale structural community organisation of the human genome.

    PubMed

    Boulos, Rasha E; Tremblay, Nicolas; Arneodo, Alain; Borgnat, Pierre; Audit, Benjamin

    2017-04-11

    Structural interaction frequency matrices between all genome loci are now experimentally achievable thanks to high-throughput chromosome conformation capture technologies. This ensues a new methodological challenge for computational biology which consists in objectively extracting from these data the structural motifs characteristic of genome organisation. We deployed the fast multi-scale community mining algorithm based on spectral graph wavelets to characterise the networks of intra-chromosomal interactions in human cell lines. We observed that there exist structural domains of all sizes up to chromosome length and demonstrated that the set of structural communities forms a hierarchy of chromosome segments. Hence, at all scales, chromosome folding predominantly involves interactions between neighbouring sites rather than the formation of links between distant loci. Multi-scale structural decomposition of human chromosomes provides an original framework to question structural organisation and its relationship to functional regulation across the scales. By construction the proposed methodology is independent of the precise assembly of the reference genome and is thus directly applicable to genomes whose assembly is not fully determined.

  6. Complete Genome Sequence of Sporisorium scitamineum and Biotrophic Interaction Transcriptome with Sugarcane

    PubMed Central

    Benevenuto, Juliana; Peters, Leila P.; Carvalho, Giselle; Palhares, Alessandra; Quecine, Maria C.; Nunes, Filipe R. S.; Kmit, Maria C. P.; Wai, Alvan; Hausner, Georg; Aitken, Karen S.; Berkman, Paul J.; Fraser, James A.; Moolhuijzen, Paula M.; Coutinho, Luiz L.; Creste, Silvana; Vieira, Maria L. C.; Kitajima, João P.; Monteiro-Vitorello, Claudia B.

    2015-01-01

    Sporisorium scitamineum is a biotrophic fungus responsible for the sugarcane smut, a worldwide spread disease. This study provides the complete sequence of individual chromosomes of S. scitamineum from telomere to telomere achieved by a combination of PacBio long reads and Illumina short reads sequence data, as well as a draft sequence of a second fungal strain. Comparative analysis to previous available sequences of another strain detected few polymorphisms among the three genomes. The novel complete sequence described herein allowed us to identify and annotate extended subtelomeric regions, repetitive elements and the mitochondrial DNA sequence. The genome comprises 19,979,571 bases, 6,677 genes encoding proteins, 111 tRNAs and 3 assembled copies of rDNA, out of our estimated number of copies as 130. Chromosomal reorganizations were detected when comparing to sequences of S. reilianum, the closest smut relative, potentially influenced by repeats of transposable elements. Repetitive elements may have also directed the linkage of the two mating-type loci. The fungal transcriptome profiling from in vitro and from interaction with sugarcane at two time points (early infection and whip emergence) revealed that 13.5% of the genes were differentially expressed in planta and particular to each developmental stage. Among them are plant cell wall degrading enzymes, proteases, lipases, chitin modification and lignin degradation enzymes, sugar transporters and transcriptional factors. The fungus also modulates transcription of genes related to surviving against reactive oxygen species and other toxic metabolites produced by the plant. Previously described effectors in smut/plant interactions were detected but some new candidates are proposed. Ten genomic islands harboring some of the candidate genes unique to S. scitamineum were expressed only in planta. RNAseq data was also used to reassure gene predictions. PMID:26065709

  7. De Novo Transcriptome Assembly and Analyses of Gene Expression during Photomorphogenesis in Diploid Wheat Triticum monococcum

    PubMed Central

    Naithani, Sushma; Sullivan, Chris; Preece, Justin; Tiwari, Vijay K.; Elser, Justin; Leonard, Jeffrey M.; Sage, Abigail; Gresham, Cathy; Kerhornou, Arnaud; Bolser, Dan; McCarthy, Fiona; Kersey, Paul; Lazo, Gerard R.; Jaiswal, Pankaj

    2014-01-01

    Background Triticum monococcum (2n) is a close ancestor of T. urartu, the A-genome progenitor of cultivated hexaploid wheat, and is therefore a useful model for the study of components regulating photomorphogenesis in diploid wheat. In order to develop genetic and genomic resources for such a study, we constructed genome-wide transcriptomes of two Triticum monococcum subspecies, the wild winter wheat T. monococcum ssp. aegilopoides (accession G3116) and the domesticated spring wheat T. monococcum ssp. monococcum (accession DV92) by generating de novo assemblies of RNA-Seq data derived from both etiolated and green seedlings. Principal Findings The de novo transcriptome assemblies of DV92 and G3116 represent 120,911 and 117,969 transcripts, respectively. We successfully mapped ∼90% of these transcripts from each accession to barley and ∼95% of the transcripts to T. urartu genomes. However, only ∼77% transcripts mapped to the annotated barley genes and ∼85% transcripts mapped to the annotated T. urartu genes. Differential gene expression analyses revealed 22% more light up-regulated and 35% more light down-regulated transcripts in the G3116 transcriptome compared to DV92. The DV92 and G3116 mRNA sequence reads aligned against the reference barley genome led to the identification of ∼500,000 single nucleotide polymorphism (SNP) and ∼22,000 simple sequence repeat (SSR) sites. Conclusions De novo transcriptome assemblies of two accessions of the diploid wheat T. monococcum provide new empirical transcriptome references for improving Triticeae genome annotations, and insights into transcriptional programming during photomorphogenesis. The SNP and SSR sites identified in our analysis provide additional resources for the development of molecular markers. PMID:24821410

  8. Visualization for genomics: the Microbial Genome Viewer.

    PubMed

    Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J

    2004-07-22

    A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV

  9. Resources for Functional Genomics Studies in Drosophila melanogaster

    PubMed Central

    Mohr, Stephanie E.; Hu, Yanhui; Kim, Kevin; Housden, Benjamin E.; Perrimon, Norbert

    2014-01-01

    Drosophila melanogaster has become a system of choice for functional genomic studies. Many resources, including online databases and software tools, are now available to support design or identification of relevant fly stocks and reagents or analysis and mining of existing functional genomic, transcriptomic, proteomic, etc. datasets. These include large community collections of fly stocks and plasmid clones, “meta” information sites like FlyBase and FlyMine, and an increasing number of more specialized reagents, databases, and online tools. Here, we introduce key resources useful to plan large-scale functional genomics studies in Drosophila and to analyze, integrate, and mine the results of those studies in ways that facilitate identification of highest-confidence results and generation of new hypotheses. We also discuss ways in which existing resources can be used and might be improved and suggest a few areas of future development that would further support large- and small-scale studies in Drosophila and facilitate use of Drosophila information by the research community more generally. PMID:24653003

  10. Local Adaptation at the Transcriptome Level in Brown Trout: Evidence from Early Life History Temperature Genomic Reaction Norms

    PubMed Central

    Meier, Kristian; Hansen, Michael Møller; Normandeau, Eric; Mensberg, Karen-Lise D.; Frydenberg, Jane; Larsen, Peter Foged; Bekkevold, Dorte; Bernatchez, Louis

    2014-01-01

    Local adaptation and its underlying molecular basis has long been a key focus in evolutionary biology. There has recently been increased interest in the evolutionary role of plasticity and the molecular mechanisms underlying local adaptation. Using transcriptome analysis, we assessed differences in gene expression profiles for three brown trout (Salmo trutta) populations, one resident and two anadromous, experiencing different temperature regimes in the wild. The study was based on an F2 generation raised in a common garden setting. A previous study of the F1 generation revealed different reaction norms and significantly higher QST than FST among populations for two early life-history traits. In the present study we investigated if genomic reaction norm patterns were also present at the transcriptome level. Eggs from the three populations were incubated at two temperatures (5 and 8 degrees C) representing conditions encountered in the local environments. Global gene expression for fry at the stage of first feeding was analysed using a 32k cDNA microarray. The results revealed differences in gene expression between populations and temperatures and population × temperature interactions, the latter indicating locally adapted reaction norms. Moreover, the reaction norms paralleled those observed previously at early life-history traits. We identified 90 cDNA clones among the genes with an interaction effect that were differently expressed between the ecologically divergent populations. These included genes involved in immune- and stress response. We observed less plasticity in the resident as compared to the anadromous populations, possibly reflecting that the degree of environmental heterogeneity encountered by individuals throughout their life cycle will select for variable level of phenotypic plasticity at the transcriptome level. Our study demonstrates the usefulness of transcriptome approaches to identify genes with different temperature reaction norms. The

  11. Placental transcriptome co-expression analysis reveals conserved regulatory program across gestation

    USDA-ARS?s Scientific Manuscript database

    Mammalian development in utero is absolutely dependent on proper placental development, which is ultimately regulated by the placental genome. The regulation of the placental genome can be directly studied by exploring the underlying organization of the placental transcriptome through a systematic a...

  12. Probing the evolution, ecology and physiology of marine protists using transcriptomics.

    PubMed

    Caron, David A; Alexander, Harriet; Allen, Andrew E; Archibald, John M; Armbrust, E Virginia; Bachy, Charles; Bell, Callum J; Bharti, Arvind; Dyhrman, Sonya T; Guida, Stephanie M; Heidelberg, Karla B; Kaye, Jonathan Z; Metzner, Julia; Smith, Sarah R; Worden, Alexandra Z

    2017-01-01

    Protists, which are single-celled eukaryotes, critically influence the ecology and chemistry of marine ecosystems, but genome-based studies of these organisms have lagged behind those of other microorganisms. However, recent transcriptomic studies of cultured species, complemented by meta-omics analyses of natural communities, have increased the amount of genetic information available for poorly represented branches on the tree of eukaryotic life. This information is providing insights into the adaptations and interactions between protists and other microorganisms and macroorganisms, but many of the genes sequenced show no similarity to sequences currently available in public databases. A better understanding of these newly discovered genes will lead to a deeper appreciation of the functional diversity and metabolic processes in the ocean. In this Review, we summarize recent developments in our understanding of the ecology, physiology and evolution of protists, derived from transcriptomic studies of cultured strains and natural communities, and discuss how these novel large-scale genetic datasets will be used in the future.

  13. Spider Transcriptomes Identify Ancient Large-Scale Gene Duplication Event Potentially Important in Silk Gland Evolution.

    PubMed

    Clarke, Thomas H; Garb, Jessica E; Hayashi, Cheryl Y; Arensburger, Peter; Ayoub, Nadia A

    2015-06-08

    The evolution of specialized tissues with novel functions, such as the silk synthesizing glands in spiders, is likely an influential driver of adaptive success. Large-scale gene duplication events and subsequent paralog divergence are thought to be required for generating evolutionary novelty. Such an event has been proposed for spiders, but not tested. We de novo assembled transcriptomes from three cobweb weaving spider species. Based on phylogenetic analyses of gene families with representatives from each of the three species, we found numerous duplication events indicative of a whole genome or segmental duplication. We estimated the age of the gene duplications relative to several speciation events within spiders and arachnids and found that the duplications likely occurred after the divergence of scorpions (order Scorpionida) and spiders (order Araneae), but before the divergence of the spider suborders Mygalomorphae and Araneomorphae, near the evolutionary origin of spider silk glands. Transcripts that are expressed exclusively or primarily within black widow silk glands are more likely to have a paralog descended from the ancient duplication event and have elevated amino acid replacement rates compared with other transcripts. Thus, an ancient large-scale gene duplication event within the spider lineage was likely an important source of molecular novelty during the evolution of silk gland-specific expression. This duplication event may have provided genetic material for subsequent silk gland diversification in the true spiders (Araneomorphae). © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. A pipeline for the de novo assembly of the Themira biloba (Sepsidae: Diptera) transcriptome using a multiple k-mer length approach.

    PubMed

    Melicher, Dacotah; Torson, Alex S; Dworkin, Ian; Bowsher, Julia H

    2014-03-12

    The Sepsidae family of flies is a model for investigating how sexual selection shapes courtship and sexual dimorphism in a comparative framework. However, like many non-model systems, there are few molecular resources available. Large-scale sequencing and assembly have not been performed in any sepsid, and the lack of a closely related genome makes investigation of gene expression challenging. Our goal was to develop an automated pipeline for de novo transcriptome assembly, and to use that pipeline to assemble and analyze the transcriptome of the sepsid Themira biloba. Our bioinformatics pipeline uses cloud computing services to assemble and analyze the transcriptome with off-site data management, processing, and backup. It uses a multiple k-mer length approach combined with a second meta-assembly to extend transcripts and recover more bases of transcript sequences than standard single k-mer assembly. We used 454 sequencing to generate 1.48 million reads from cDNA generated from embryo, larva, and pupae of T. biloba and assembled a transcriptome consisting of 24,495 contigs. Annotation identified 16,705 transcripts, including those involved in embryogenesis and limb patterning. We assembled transcriptomes from an additional three non-model organisms to demonstrate that our pipeline assembled a higher-quality transcriptome than single k-mer approaches across multiple species. The pipeline we have developed for assembly and analysis increases contig length, recovers unique transcripts, and assembles more base pairs than other methods through the use of a meta-assembly. The T. biloba transcriptome is a critical resource for performing large-scale RNA-Seq investigations of gene expression patterns, and is the first transcriptome sequenced in this Dipteran family.

  15. Impact of Transcriptomics on Our Understanding of Pulmonary Fibrosis

    PubMed Central

    Vukmirovic, Milica; Kaminski, Naftali

    2018-01-01

    Idiopathic pulmonary fibrosis (IPF) is a lethal fibrotic lung disease characterized by aberrant remodeling of the lung parenchyma with extensive changes to the phenotypes of all lung resident cells. The introduction of transcriptomics, genome scale profiling of thousands of RNA transcripts, caused a significant inversion in IPF research. Instead of generating hypotheses based on animal models of disease, or biological plausibility, with limited validation in humans, investigators were able to generate hypotheses based on unbiased molecular analysis of human samples and then use animal models of disease to test their hypotheses. In this review, we describe the insights made from transcriptomic analysis of human IPF samples. We describe how transcriptomic studies led to identification of novel genes and pathways involved in the human IPF lung such as: matrix metalloproteinases, WNT pathway, epithelial genes, role of microRNAs among others, as well as conceptual insights such as the involvement of developmental pathways and deep shifts in epithelial and fibroblast phenotypes. The impact of lung and transcriptomic studies on disease classification, endotype discovery, and reproducible biomarkers is also described in detail. Despite these impressive achievements, the impact of transcriptomic studies has been limited because they analyzed bulk tissue and did not address the cellular and spatial heterogeneity of the IPF lung. We discuss new emerging technologies and applications, such as single-cell RNAseq and microenvironment analysis that may address cellular and spatial heterogeneity. We end by making the point that most current tissue collections and resources are not amenable to analysis using the novel technologies. To take advantage of the new opportunities, we need new efforts of sample collections, this time focused on access to all the microenvironments and cells in the IPF lung. PMID:29670881

  16. Comparative transcriptomics of early dipteran development

    PubMed Central

    2013-01-01

    Background Modern sequencing technologies have massively increased the amount of data available for comparative genomics. Whole-transcriptome shotgun sequencing (RNA-seq) provides a powerful basis for comparative studies. In particular, this approach holds great promise for emerging model species in fields such as evolutionary developmental biology (evo-devo). Results We have sequenced early embryonic transcriptomes of two non-drosophilid dipteran species: the moth midge Clogmia albipunctata, and the scuttle fly Megaselia abdita. Our analysis includes a third, published, transcriptome for the hoverfly Episyrphus balteatus. These emerging models for comparative developmental studies close an important phylogenetic gap between Drosophila melanogaster and other insect model systems. In this paper, we provide a comparative analysis of early embryonic transcriptomes across species, and use our data for a phylogenomic re-evaluation of dipteran phylogenetic relationships. Conclusions We show how comparative transcriptomics can be used to create useful resources for evo-devo, and to investigate phylogenetic relationships. Our results demonstrate that de novo assembly of short (Illumina) reads yields high-quality, high-coverage transcriptomic data sets. We use these data to investigate deep dipteran phylogenetic relationships. Our results, based on a concatenation of 160 orthologous genes, provide support for the traditional view of Clogmia being the sister group of Brachycera (Megaselia, Episyrphus, Drosophila), rather than that of Culicomorpha (which includes mosquitoes and blackflies). PMID:23432914

  17. Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

    PubMed

    Lie, Kai K; Tørresen, Ole K; Solbakken, Monica Hongrø; Rønnestad, Ivar; Tooming-Klunderud, Ave; Nederbragt, Alexander J; Jentoft, Sissel; Sæle, Øystein

    2018-03-06

    The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species. Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut. We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.

  18. PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics

    PubMed Central

    2012-01-01

    Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the

  19. Transcriptomic Analysis of Phenotypic Changes in Birch (Betula platyphylla) Autotetraploids

    PubMed Central

    Mu, Huai-Zhi; Liu, Zi-Jia; Lin, Lin; Li, Hui-Yu; Jiang, Jing; Liu, Gui-Feng

    2012-01-01

    Plant breeders have focused much attention on polyploid trees because of their importance to forestry. To evaluate the impact of intraspecies genome duplication on the transcriptome, a series of Betula platyphylla autotetraploids and diploids were generated from four full-sib families. The phenotypes and transcriptomes of these autotetraploid individuals were compared with those of diploid trees. Autotetraploids were generally superior in breast-height diameter, volume, leaf, fruit and stoma and were generally inferior in height compared to diploids. Transcriptome data revealed numerous changes in gene expression attributable to autotetraploidization, which resulted in the upregulation of 7052 unigenes and the downregulation of 3658 unigenes. Pathway analysis revealed that the biosynthesis and signal transduction of indoleacetate (IAA) and ethylene were altered after genome duplication, which may have contributed to phenotypic changes. These results shed light on variations in birch autotetraploidization and help identify important genes for the genetic engineering of birch trees. PMID:23202935

  20. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells.

    PubMed

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. © 2018 Han et al.; Published by Cold Spring Harbor Laboratory Press.

  1. SIDR: simultaneous isolation and parallel sequencing of genomic DNA and total RNA from single cells

    PubMed Central

    Han, Kyung Yeon; Kim, Kyu-Tae; Joung, Je-Gun; Son, Dae-Soon; Kim, Yeon Jeong; Jo, Areum; Jeon, Hyo-Jeong; Moon, Hui-Sung; Yoo, Chang Eun; Chung, Woosung; Eum, Hye Hyeon; Kim, Sangmin; Kim, Hong Kwan; Lee, Jeong Eon; Ahn, Myung-Ju; Lee, Hae-Ock; Park, Donghyun; Park, Woong-Yang

    2018-01-01

    Simultaneous sequencing of the genome and transcriptome at the single-cell level is a powerful tool for characterizing genomic and transcriptomic variation and revealing correlative relationships. However, it remains technically challenging to analyze both the genome and transcriptome in the same cell. Here, we report a novel method for simultaneous isolation of genomic DNA and total RNA (SIDR) from single cells, achieving high recovery rates with minimal cross-contamination, as is crucial for accurate description and integration of the single-cell genome and transcriptome. For reliable and efficient separation of genomic DNA and total RNA from single cells, the method uses hypotonic lysis to preserve nuclear lamina integrity and subsequently captures the cell lysate using antibody-conjugated magnetic microbeads. Evaluating the performance of this method using real-time PCR demonstrated that it efficiently recovered genomic DNA and total RNA. Thorough data quality assessments showed that DNA and RNA simultaneously fractionated by the SIDR method were suitable for genome and transcriptome sequencing analysis at the single-cell level. The integration of single-cell genome and transcriptome sequencing by SIDR (SIDR-seq) showed that genetic alterations, such as copy-number and single-nucleotide variations, were more accurately captured by single-cell SIDR-seq compared with conventional single-cell RNA-seq, although copy-number variations positively correlated with the corresponding gene expression levels. These results suggest that SIDR-seq is potentially a powerful tool to reveal genetic heterogeneity and phenotypic information inferred from gene expression patterns at the single-cell level. PMID:29208629

  2. E-Flux2 and SPOT: Validated Methods for Inferring Intracellular Metabolic Flux Distributions from Transcriptomic Data.

    PubMed

    Kim, Min Kyung; Lane, Anatoliy; Kelley, James J; Lun, Desmond S

    2016-01-01

    Several methods have been developed to predict system-wide and condition-specific intracellular metabolic fluxes by integrating transcriptomic data with genome-scale metabolic models. While powerful in many settings, existing methods have several shortcomings, and it is unclear which method has the best accuracy in general because of limited validation against experimentally measured intracellular fluxes. We present a general optimization strategy for inferring intracellular metabolic flux distributions from transcriptomic data coupled with genome-scale metabolic reconstructions. It consists of two different template models called DC (determined carbon source model) and AC (all possible carbon sources model) and two different new methods called E-Flux2 (E-Flux method combined with minimization of l2 norm) and SPOT (Simplified Pearson cOrrelation with Transcriptomic data), which can be chosen and combined depending on the availability of knowledge on carbon source or objective function. This enables us to simulate a broad range of experimental conditions. We examined E. coli and S. cerevisiae as representative prokaryotic and eukaryotic microorganisms respectively. The predictive accuracy of our algorithm was validated by calculating the uncentered Pearson correlation between predicted fluxes and measured fluxes. To this end, we compiled 20 experimental conditions (11 in E. coli and 9 in S. cerevisiae), of transcriptome measurements coupled with corresponding central carbon metabolism intracellular flux measurements determined by 13C metabolic flux analysis (13C-MFA), which is the largest dataset assembled to date for the purpose of validating inference methods for predicting intracellular fluxes. In both organisms, our method achieves an average correlation coefficient ranging from 0.59 to 0.87, outperforming a representative sample of competing methods. Easy-to-use implementations of E-Flux2 and SPOT are available as part of the open-source package MOST (http

  3. A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

    PubMed Central

    Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

    2016-01-01

    Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study

  4. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  5. Novel transcriptome assembly and improved annotation of the whiteleg shrimp (Litopenaeus vannamei), a dominant crustacean in global seafood mariculture.

    PubMed

    Ghaffari, Noushin; Sanchez-Flores, Alejandro; Doan, Ryan; Garcia-Orozco, Karina D; Chen, Patricia L; Ochoa-Leyva, Adrian; Lopez-Zavala, Alonso A; Carrasco, J Salvador; Hong, Chris; Brieba, Luis G; Rudiño-Piñera, Enrique; Blood, Philip D; Sawyer, Jason E; Johnson, Charles D; Dindot, Scott V; Sotelo-Mundo, Rogerio R; Criscitiello, Michael F

    2014-11-25

    We present a new transcriptome assembly of the Pacific whiteleg shrimp (Litopenaeus vannamei), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality of this assembly and the affiliated targeted homology searches greatly enrich the curated transcripts currently available in public databases for this species. Comparison with the model arthropod Daphnia allows some insights into defining characteristics of decapod crustaceans. This large-scale gene discovery gives the broadest depth yet to the annotated transcriptome of this important species and should be of value to ongoing genomics and immunogenetic resistance studies in this shrimp of paramount global economic importance.

  6. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles.

    PubMed

    Tzika, Athanasia C; Helaers, Raphaël; Schramm, Gerrit; Milinkovitch, Michel C

    2011-09-26

    Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics.

  7. Reptilian-transcriptome v1.0, a glimpse in the brain transcriptome of five divergent Sauropsida lineages and the phylogenetic position of turtles

    PubMed Central

    2011-01-01

    Background Reptiles are largely under-represented in comparative genomics despite the fact that they are substantially more diverse in many respects than mammals. Given the high divergence of reptiles from classical model species, next-generation sequencing of their transcriptomes is an approach of choice for gene identification and annotation. Results Here, we use 454 technology to sequence the brain transcriptome of four divergent reptilian and one reference avian species: the Nile crocodile, the corn snake, the bearded dragon, the red-eared turtle, and the chicken. Using an in-house pipeline for recursive similarity searches of >3,000,000 reads against multiple databases from 7 reference vertebrates, we compile a reptilian comparative transcriptomics dataset, with homology assignment for 20,000 to 31,000 transcripts per species and a cumulated non-redundant sequence length of 248.6 Mbases. Our approach identifies the majority (87%) of chicken brain transcripts and about 50% of de novo assembled reptilian transcripts. In addition to 57,502 microsatellite loci, we identify thousands of SNP and indel polymorphisms for population genetic and linkage analyses. We also build very large multiple alignments for Sauropsida and mammals (two million residues per species) and perform extensive phylogenetic analyses suggesting that turtles are not basal living reptiles but are rather associated with Archosaurians, hence, potentially answering a long-standing question in the phylogeny of Amniotes. Conclusions The reptilian transcriptome (freely available at http://www.reptilian-transcriptomes.org) should prove a useful new resource as reptiles are becoming important new models for comparative genomics, ecology, and evolutionary developmental genetics. PMID:21943375

  8. A blow to the fly - Lucilia cuprina draft genome and transcriptome to support advances in biology and biotechnology.

    PubMed

    Anstead, Clare A; Batterham, Philip; Korhonen, Pasi K; Young, Neil D; Hall, Ross S; Bowles, Vernon M; Richards, Stephen; Scott, Maxwell J; Gasser, Robin B

    2016-01-01

    The blow fly, Lucilia cuprina (Wiedemann, 1830) is a parasitic insect of major global economic importance. Maggots of this fly parasitize the skin of animal hosts, feed on excretions and tissues, and cause severe disease (flystrike or myiasis). Although there has been considerable research on L. cuprina over the years, little is understood about the molecular biology, biochemistry and genetics of this parasitic fly, as well as its relationship with its hosts and the disease that it causes. This situation might change with the recent report of the draft genome and transcriptome of this blow fly, which has given new and global insights into its biology, interactions with the host animal and aspects of insecticide resistance at the molecular level. This genomic resource will likely enable many fundamental and applied research areas in the future. The present article gives a background on L. cuprina and myiasis, a brief account of past and current treatment, prevention and control approaches, and provides a perspective on the impact that the L. cuprina genome should have on future research of this and related parasitic flies, and the design of new and improved interventions for myiasis. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. Merkel Cell Polyomavirus Exhibits Dominant Control of the Tumor Genome and Transcriptome in Virus-Associated Merkel Cell Carcinoma.

    PubMed

    Starrett, Gabriel J; Marcelus, Christina; Cantalupo, Paul G; Katz, Joshua P; Cheng, Jingwei; Akagi, Keiko; Thakuria, Manisha; Rabinowits, Guilherme; Wang, Linda C; Symer, David E; Pipas, James M; Harris, Reuben S; DeCaprio, James A

    2017-01-03

    Merkel cell polyomavirus is the primary etiological agent of the aggressive skin cancer Merkel cell carcinoma (MCC). Recent studies have revealed that UV radiation is the primary mechanism for somatic mutagenesis in nonviral forms of MCC. Here, we analyze the whole transcriptomes and genomes of primary MCC tumors. Our study reveals that virus-associated tumors have minimally altered genomes compared to non-virus-associated tumors, which are dominated by UV-mediated mutations. Although virus-associated tumors contain relatively small mutation burdens, they exhibit a distinct mutation signature with observable transcriptionally biased kataegic events. In addition, viral integration sites overlap focal genome amplifications in virus-associated tumors, suggesting a potential mechanism for these events. Collectively, our studies indicate that Merkel cell polyomavirus is capable of hijacking cellular processes and driving tumorigenesis to the same severity as tens of thousands of somatic genome alterations. A variety of mutagenic processes that shape the evolution of tumors are critical determinants of disease outcome. Here, we sequenced the entire genome of virus-positive and virus-negative primary Merkel cell carcinomas (MCCs), revealing distinct mutation spectra and corresponding expression profiles. Our studies highlight the strong effect that Merkel cell polyomavirus has on the divergent development of viral MCC compared to the somatic alterations that typically drive nonviral tumorigenesis. A more comprehensive understanding of the distinct mutagenic processes operative in viral and nonviral MCCs has implications for the effective treatment of these tumors. Copyright © 2017 Starrett et al.

  10. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    DOE PAGES

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.; ...

    2015-09-23

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. Themore » nuclear genome of C. tobin is small (59 Mb), compact (~40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. In conclusion, the Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.« less

  11. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    PubMed Central

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.; Ryken, Scott A.; Yost, Will; Jha, Ramesh K.; Patterson, Johnathan; Monnat, Raymond J.; Barlow, Steven B.; Starkenburg, Shawn R.; Cattolico, Rose Ann

    2015-01-01

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. The nuclear genome of C. tobin is small (59 Mb), compact (∼40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. The Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes. PMID:26397803

  12. Genome Sequence and Transcriptome Analyses of Chrysochromulina tobin: Metabolic Tools for Enhanced Algal Fitness in the Prominent Order Prymnesiales (Haptophyceae)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.

    Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. Themore » nuclear genome of C. tobin is small (59 Mb), compact (~40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. In conclusion, the Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes.« less

  13. Architecture of thermal adaptation in an Exiguobacterium sibiricum strain isolated from 3 million year old permafrost: A genome and transcriptome approach

    PubMed Central

    Rodrigues, Debora F; Ivanova, Natalia; He, Zhili; Huebner, Marianne; Zhou, Jizhong; Tiedje, James M

    2008-01-01

    Background Many microorganisms have a wide temperature growth range and versatility to tolerate large thermal fluctuations in diverse environments, however not many have been fully explored over their entire growth temperature range through a holistic view of its physiology, genome, and transcriptome. We used Exiguobacterium sibiricum strain 255-15, a psychrotrophic bacterium from 3 million year old Siberian permafrost that grows from -5°C to 39°C to study its thermal adaptation. Results The E. sibiricum genome has one chromosome and two small plasmids with a total of 3,015 protein-encoding genes (CDS), and a GC content of 47.7%. The genome and transcriptome analysis along with the organism's known physiology was used to better understand its thermal adaptation. A total of 27%, 3.2%, and 5.2% of E. sibiricum CDS spotted on the DNA microarray detected differentially expressed genes in cells grown at -2.5°C, 10°C, and 39°C, respectively, when compared to cells grown at 28°C. The hypothetical and unknown genes represented 10.6%, 0.89%, and 2.3% of the CDS differentially expressed when grown at -2.5°C, 10°C, and 39°C versus 28°C, respectively. Conclusion The results show that E. sibiricum is constitutively adapted to cold temperatures stressful to mesophiles since little differential gene expression was observed between 4°C and 28°C, but at the extremities of its Arrhenius growth profile, namely -2.5°C and 39°C, several physiological and metabolic adaptations associated with stress responses were observed. PMID:19019206

  14. Genome sequence and genetic diversity of the common carp, Cyprinus carpio.

    PubMed

    Xu, Peng; Zhang, Xiaofeng; Wang, Xumin; Li, Jiongtang; Liu, Guiming; Kuang, Youyi; Xu, Jian; Zheng, Xianhu; Ren, Lufeng; Wang, Guoliang; Zhang, Yan; Huo, Linhe; Zhao, Zixia; Cao, Dingchen; Lu, Cuiyun; Li, Chao; Zhou, Yi; Liu, Zhanjiang; Fan, Zhonghua; Shan, Guangle; Li, Xingang; Wu, Shuangxiu; Song, Lipu; Hou, Guangyuan; Jiang, Yanliang; Jeney, Zsigmond; Yu, Dan; Wang, Li; Shao, Changjun; Song, Lai; Sun, Jing; Ji, Peifeng; Wang, Jian; Li, Qiang; Xu, Liming; Sun, Fanyue; Feng, Jianxin; Wang, Chenghui; Wang, Shaolin; Wang, Baosen; Li, Yan; Zhu, Yaping; Xue, Wei; Zhao, Lan; Wang, Jintu; Gu, Ying; Lv, Weihua; Wu, Kejing; Xiao, Jingfa; Wu, Jiayan; Zhang, Zhang; Yu, Jun; Sun, Xiaowen

    2014-11-01

    The common carp, Cyprinus carpio, is one of the most important cyprinid species and globally accounts for 10% of freshwater aquaculture production. Here we present a draft genome of domesticated C. carpio (strain Songpu), whose current assembly contains 52,610 protein-coding genes and approximately 92.3% coverage of its paleotetraploidized genome (2n = 100). The latest round of whole-genome duplication has been estimated to have occurred approximately 8.2 million years ago. Genome resequencing of 33 representative individuals from worldwide populations demonstrates a single origin for C. carpio in 2 subspecies (C. carpio Haematopterus and C. carpio carpio). Integrative genomic and transcriptomic analyses were used to identify loci potentially associated with traits including scaling patterns and skin color. In combination with the high-resolution genetic map, the draft genome paves the way for better molecular studies and improved genome-assisted breeding of C. carpio and other closely related species.

  15. ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.

    PubMed

    Zeng, Victor; Extavour, Cassandra G

    2012-01-01

    The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental

  16. GenoMetric Query Language: a novel approach to large-scale genomic data management.

    PubMed

    Masseroli, Marco; Pinoli, Pietro; Venco, Francesco; Kaitoua, Abdulrahman; Jalili, Vahid; Palluzzi, Fernando; Muller, Heiko; Ceri, Stefano

    2015-06-15

    Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets. The GMQL toolkit is freely available for non-commercial use at http://www.bioinformatics.deib.polimi.it/GMQL/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Comparative genomics and transcriptome analysis of Lactobacillus rhamnosus ATCC 11443 and the mutant strain SCT-10-10-60 with enhanced L-lactic acid production capacity.

    PubMed

    Sun, Liang; Lu, Zhilong; Li, Jianxiu; Sun, Feifei; Huang, Ribo

    2018-02-01

    Mechanisms for high L-lactic acid production remain unclear in many bacteria. Lactobacillus rhamnosus SCT-10-10-60 was previously obtained from L. rhamnosus ATCC 11443 via mutagenesis and showed improved L-lactic acid production. In this study, the genomes of strains SCT-10-10-60 and ATCC 11443 were sequenced. Both genomes are a circular chromosome, 2.99 Mb in length with a GC content of approximately 46.8%. Eight split genes were identified in strain SCT-10-10-60, including two LytR family transcriptional regulators, two Rex redox-sensing transcriptional repressors, and four ABC transporters. In total, 60 significantly up-regulated genes (log 2 fold-change ≥ 2) and 39 significantly down-regulated genes (log 2 fold-change ≤ - 2) were identified by a transcriptome comparison between strains SCT-10-10-60 and ATCC 11443. KEGG pathway enrichment analysis revealed that "pyruvate metabolism" was significantly different (P < 0.05) between the two strains. The split genes and the differentially expressed genes involved in the "pyruvate metabolism" pathway are probably responsible for the increased L-lactic acid production by SCT-10-10-60. The genome and transcriptome sequencing information and comparison of SCT-10-10-60 with ATCC 11443 provide insights into the anabolism of L-lactic acid and a reference for improving L-lactic acid production using genetic engineering.

  18. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing.

    PubMed

    Pang, Chi Nam Ignatius; Tay, Aidan P; Aya, Carlos; Twine, Natalie A; Harkness, Linda; Hart-Smith, Gene; Chia, Samantha Z; Chen, Zhiliang; Deshpande, Nandan P; Kaakoush, Nadeem O; Mitchell, Hazel M; Kassem, Moustapha; Wilkins, Marc R

    2014-01-03

    Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.

  19. From genes to milk: genomic organization and epigenetic regulation of the mammary transcriptome.

    PubMed

    Lemay, Danielle G; Pollard, Katherine S; Martin, William F; Freeman Zadrowski, Courtneay; Hernandez, Joseph; Korf, Ian; German, J Bruce; Rijnkels, Monique

    2013-01-01

    Even in genomes lacking operons, a gene's position in the genome influences its potential for expression. The mechanisms by which adjacent genes are co-expressed are still not completely understood. Using lactation and the mammary gland as a model system, we explore the hypothesis that chromatin state contributes to the co-regulation of gene neighborhoods. The mammary gland represents a unique evolutionary model, due to its recent appearance, in the context of vertebrate genomes. An understanding of how the mammary gland is regulated to produce milk is also of biomedical and agricultural importance for human lactation and dairying. Here, we integrate epigenomic and transcriptomic data to develop a comprehensive regulatory model. Neighborhoods of mammary-expressed genes were determined using expression data derived from pregnant and lactating mice and a neighborhood scoring tool, G-NEST. Regions of open and closed chromatin were identified by ChIP-Seq of histone modifications H3K36me3, H3K4me2, and H3K27me3 in the mouse mammary gland and liver tissue during lactation. We found that neighborhoods of genes in regions of uniquely active chromatin in the lactating mammary gland, compared with liver tissue, were extremely rare. Rather, genes in most neighborhoods were suppressed during lactation as reflected in their expression levels and their location in regions of silenced chromatin. Chromatin silencing was largely shared between the liver and mammary gland during lactation, and what distinguished the mammary gland was mainly a small tissue-specific repertoire of isolated, expressed genes. These findings suggest that an advantage of the neighborhood organization is in the collective repression of groups of genes via a shared mechanism of chromatin repression. Genes essential to the mammary gland's uniqueness are isolated from neighbors, and likely have less tolerance for variation in expression, properties they share with genes responsible for an organism's survival.

  20. De novo transcriptome of Ischnura elegans provides insights into sensory biology, colour and vision genes.

    PubMed

    Chauhan, Pallavi; Hansson, Bengt; Kraaijeveld, Ken; de Knijff, Peter; Svensson, Erik I; Wellenreuther, Maren

    2014-09-22

    There is growing interest in odonates (damselflies and dragonflies) as model organisms in ecology and evolutionary biology but the development of genomic resources has been slow. So far only one draft genome (Ladona fulva) and one transcriptome assembly (Enallagma hageni) have been published. Odonates have some of the most advanced visual systems among insects and several species are colour polymorphic, and genomic and transcriptomic data would allow studying the genomic architecture of these interesting traits and make detailed comparative studies between related species possible. Here, we present a comprehensive de novo transcriptome assembly for the blue-tailed damselfly Ischnura elegans (Odonata: Coenagrionidae) built from short-read RNA-seq data. The transcriptome analysis in this paper provides a first step towards identifying genes and pathways underlying the visual and colour systems in this insect group. Illumina RNA sequencing performed on tissues from the head, thorax and abdomen generated 428,744,100 paired-ends reads amounting to 110 Gb of sequence data, which was assembled de novo with Trinity. A transcriptome was produced after filtering and quality checking yielding a final set of 60,232 high quality transcripts for analysis. CEGMA software identified 247 out of 248 ultra-conserved core proteins as 'complete' in the transcriptome assembly, yielding a completeness of 99.6%. BLASTX and InterProScan annotated 55% of the assembled transcripts and showed that the three tissue types differed both qualitatively and quantitatively in I. elegans. Differential expression identified 8,625 transcripts to be differentially expressed in head, thorax and abdomen. Targeted analyses of vision and colour functional pathways identified the presence of four different opsin types and three pigmentation pathways. We also identified transcripts involved in temperature sensitivity, thermoregulation and olfaction. All these traits and their associated transcripts are of

  1. Genome-scale analysis of positionally relocated genes

    PubMed Central

    Bhutkar, Arjun; Russo, Susan M.; Smith, Temple F.; Gelbart, William M.

    2007-01-01

    During evolution, genome reorganization includes large-scale events such as inversions, translocations, and segmental or even whole-genome duplications, as well as fine-scale events such as the relocation of individual genes. This latter category, which we will refer to as positionally relocated genes (PRGs), is the subject of this report. Assessment of the magnitude of such PRGs and of possible contributing mechanisms is aided by a comparative analysis of related genomes, where conserved chromosomal organization can aid in identifying genes that have acquired a new location in a lineage of these genomes. Here we utilize two methods to comprehensively identify relocated protein-coding genes in the recently sequenced genomes of 12 species of genus Drosophila. We use exceptions to the general rule of maintenance of chromosome arm (Muller element) association for most Drosophila genes to identify one major class of PRGs. We also identify a partially overlapping set of PRGs among “embedded genes,” located within the extents of other surrounding genes. We provide evidence that PRG movements have at least two different origins: Some events occur via retrotransposition of processed RNAs and others via a DNA-based transposition mechanism. Overall, we identify several hundred PRGs that arose within a lineage of the genus Drosophila phylogeny and provide suggestive evidence that a few thousand such events have occurred within the radiation of the insect order Diptera, thereby illustrating the magnitude of the contribution of PRG movement to chromosomal reorganization during evolution. PMID:17989252

  2. Genome-scale resources for Thermoanaerobacterium saccharolyticum.

    PubMed

    Currie, Devin H; Raman, Babu; Gowen, Christopher M; Tschaplinski, Timothy J; Land, Miriam L; Brown, Steven D; Covalla, Sean F; Klingeman, Dawn M; Yang, Zamin K; Engle, Nancy L; Johnson, Courtney M; Rodriguez, Miguel; Shaw, A Joe; Kenealy, William R; Lynd, Lee R; Fong, Stephen S; Mielenz, Jonathan R; Davison, Brian H; Hogsett, David A; Herring, Christopher D

    2015-06-26

    Thermoanaerobacterium saccharolyticum is a hemicellulose-degrading thermophilic anaerobe that was previously engineered to produce ethanol at high yield. A major project was undertaken to develop this organism into an industrial biocatalyst, but the lack of genome information and resources were recognized early on as a key limitation. Here we present a set of genome-scale resources to enable the systems level investigation and development of this potentially important industrial organism. Resources include a complete genome sequence for strain JW/SL-YS485, a genome-scale reconstruction of metabolism, tiled microarray data showing transcription units, mRNA expression data from 71 different growth conditions or timepoints and GC/MS-based metabolite analysis data from 42 different conditions or timepoints. Growth conditions include hemicellulose hydrolysate, the inhibitors HMF, furfural, diamide, and ethanol, as well as high levels of cellulose, xylose, cellobiose or maltodextrin. The genome consists of a 2.7 Mbp chromosome and a 110 Kbp megaplasmid. An active prophage was also detected, and the expression levels of CRISPR genes were observed to increase in association with those of the phage. Hemicellulose hydrolysate elicited a response of carbohydrate transport and catabolism genes, as well as poorly characterized genes suggesting a redox challenge. In some conditions, a time series of combined transcription and metabolite measurements were made to allow careful study of microbial physiology under process conditions. As a demonstration of the potential utility of the metabolic reconstruction, the OptKnock algorithm was used to predict a set of gene knockouts that maximize growth-coupled ethanol production. The predictions validated intuitive strain designs and matched previous experimental results. These data will be a useful asset for efforts to develop T. saccharolyticum for efficient industrial production of biofuels. The resources presented herein may also be

  3. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  4. Transcriptomics as a tool for assessing the scalability of mammalian cell perfusion systems.

    PubMed

    Jayapal, Karthik P; Goudar, Chetan T

    2014-01-01

    DNA microarray-based transcriptomics have been used to determine the time course of laboratory and manufacturing-scale perfusion bioreactors in an attempt to characterize cell physiological state at these two bioreactor scales. Given the limited availability of genomic data for baby hamster kidney (BHK) cells, a Chinese hamster ovary (CHO)-based microarray was used following a feasibility assessment of cross-species hybridization. A heat shock experiment was performed using both BHK and CHO cells and resulting DNA microarray data were analyzed using a filtering criteria of perfect match (PM)/single base mismatch (MM) > 1.5 and PM-MM > 50 to exclude probes with low specificity or sensitivity for cross-species hybridizations. For BHK cells, 8910 probe sets (39 %) passed the cutoff criteria, whereas 12,961 probe sets (56 %) passed the cutoff criteria for CHO cells. Yet, the data from BHK cells allowed distinct clustering of heat shock and control samples as well as identification of biologically relevant genes as being differentially expressed, indicating the utility of cross-species hybridization. Subsequently, DNA microarray analysis was performed on time course samples from laboratory- and manufacturing-scale perfusion bioreactors that were operated under the same conditions. A majority of the variability (37 %) was associated with the first principal component (PC-1). Although PC-1 changed monotonically with culture duration, the trends were very similar in both the laboratory and manufacturing-scale bioreactors. Therefore, despite time-related changes to the cell physiological state, transcriptomic fingerprints were similar across the two bioreactor scales at any given instance in culture. Multiple genes were identified with time-course expression profiles that were very highly correlated (> 0.9) with bioprocess variables of interest. Although the current incomplete annotation limits the biological interpretation of these observations, their full potential may be

  5. Genomics of Compositae crops: reference transcriptome assemblies and evidence of hybridization with wild relatives.

    PubMed

    Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H

    2014-01-01

    Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.

  6. Transcriptome Wide Annotation of Eukaryotic RNase III Reactivity and Degradation Signals

    PubMed Central

    Gagnon, Jules; Lavoie, Mathieu; Catala, Mathieu; Malenfant, Francis; Elela, Sherif Abou

    2015-01-01

    Detection and validation of the RNA degradation signals controlling transcriptome stability are essential steps for understanding how cells regulate gene expression. Here we present complete genomic and biochemical annotations of the signals required for RNA degradation by the dsRNA specific ribonuclease III (Rnt1p) and examine its impact on transcriptome expression. Rnt1p cleavage signals are randomly distributed in the yeast genome, and encompass a wide variety of sequences, indicating that transcriptome stability is not determined by the recurrence of a fixed cleavage motif. Instead, RNA reactivity is defined by the sequence and structural context in which the cleavage sites are located. Reactive signals are often associated with transiently expressed genes, and their impact on RNA expression is linked to growth conditions. Together, the data suggest that Rnt1p reactivity is triggered by malleable RNA degradation signals that permit dynamic response to changes in growth conditions. PMID:25680180

  7. Guidelines for Genome-Scale Analysis of Biological Rhythms.

    PubMed

    Hughes, Michael E; Abruzzi, Katherine C; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M Fernanda; Chen, Zheng; Chiu, Joanna C; Cox, Juergen; Crowell, Alexander M; DeBruyne, Jason P; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J; Duffield, Giles E; Dunlap, Jay C; Eckel-Mahan, Kristin; Esser, Karyn A; FitzGerald, Garret A; Forger, Daniel B; Francey, Lauren J; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H; Herzel, Hanspeter; Herzog, Erik D; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J; Hurley, Jennifer M; de la Iglesia, Horacio O; Johnson, Carl; Kay, Steve A; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A; Li, Jiajia; Li, Xiaodong; Liu, Andrew C; Loros, Jennifer J; Martino, Tami A; Menet, Jerome S; Merrow, Martha; Millar, Andrew J; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N; Olmedo, Maria; Nusinow, Dmitri A; Ptáček, Louis J; Rand, David; Reddy, Akhilesh B; Robles, Maria S; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D; Rund, Samuel S C; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J; Storch, Kai-Florian; Takahashi, Joseph S; Ueda, Hiroki R; Wang, Han; Weitz, Charles; Westermark, Pål O; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B

    2017-10-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding "big data" that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them.

  8. Guidelines for Genome-Scale Analysis of Biological Rhythms

    PubMed Central

    Hughes, Michael E.; Abruzzi, Katherine C.; Allada, Ravi; Anafi, Ron; Arpat, Alaaddin Bulak; Asher, Gad; Baldi, Pierre; de Bekker, Charissa; Bell-Pedersen, Deborah; Blau, Justin; Brown, Steve; Ceriani, M. Fernanda; Chen, Zheng; Chiu, Joanna C.; Cox, Juergen; Crowell, Alexander M.; DeBruyne, Jason P.; Dijk, Derk-Jan; DiTacchio, Luciano; Doyle, Francis J.; Duffield, Giles E.; Dunlap, Jay C.; Eckel-Mahan, Kristin; Esser, Karyn A.; FitzGerald, Garret A.; Forger, Daniel B.; Francey, Lauren J.; Fu, Ying-Hui; Gachon, Frédéric; Gatfield, David; de Goede, Paul; Golden, Susan S.; Green, Carla; Harer, John; Harmer, Stacey; Haspel, Jeff; Hastings, Michael H.; Herzel, Hanspeter; Herzog, Erik D.; Hoffmann, Christy; Hong, Christian; Hughey, Jacob J.; Hurley, Jennifer M.; de la Iglesia, Horacio O.; Johnson, Carl; Kay, Steve A.; Koike, Nobuya; Kornacker, Karl; Kramer, Achim; Lamia, Katja; Leise, Tanya; Lewis, Scott A.; Li, Jiajia; Li, Xiaodong; Liu, Andrew C.; Loros, Jennifer J.; Martino, Tami A.; Menet, Jerome S.; Merrow, Martha; Millar, Andrew J.; Mockler, Todd; Naef, Felix; Nagoshi, Emi; Nitabach, Michael N.; Olmedo, Maria; Nusinow, Dmitri A.; Ptáček, Louis J.; Rand, David; Reddy, Akhilesh B.; Robles, Maria S.; Roenneberg, Till; Rosbash, Michael; Ruben, Marc D.; Rund, Samuel S.C.; Sancar, Aziz; Sassone-Corsi, Paolo; Sehgal, Amita; Sherrill-Mix, Scott; Skene, Debra J.; Storch, Kai-Florian; Takahashi, Joseph S.; Ueda, Hiroki R.; Wang, Han; Weitz, Charles; Westermark, Pål O.; Wijnen, Herman; Xu, Ying; Wu, Gang; Yoo, Seung-Hee; Young, Michael; Zhang, Eric Erquan; Zielinski, Tomasz; Hogenesch, John B.

    2017-01-01

    Genome biology approaches have made enormous contributions to our understanding of biological rhythms, particularly in identifying outputs of the clock, including RNAs, proteins, and metabolites, whose abundance oscillates throughout the day. These methods hold significant promise for future discovery, particularly when combined with computational modeling. However, genome-scale experiments are costly and laborious, yielding “big data” that are conceptually and statistically difficult to analyze. There is no obvious consensus regarding design or analysis. Here we discuss the relevant technical considerations to generate reproducible, statistically sound, and broadly useful genome-scale data. Rather than suggest a set of rigid rules, we aim to codify principles by which investigators, reviewers, and readers of the primary literature can evaluate the suitability of different experimental designs for measuring different aspects of biological rhythms. We introduce CircaInSilico, a web-based application for generating synthetic genome biology data to benchmark statistical methods for studying biological rhythms. Finally, we discuss several unmet analytical needs, including applications to clinical medicine, and suggest productive avenues to address them. PMID:29098954

  9. The consequences of chromosomal aneuploidy on the transcriptome of cancer cells☆

    PubMed Central

    Ried, Thomas; Hu, Yue; Difilippantonio, Michael J.; Ghadimi, B. Michael; Grade, Marian; Camps, Jordi

    2016-01-01

    Chromosomal aneuploidies are a defining feature of carcinomas, i.e., tumors of epithelial origin. Such aneuploidies result in tumor specific genomic copy number alterations. The patterns of genomic imbalances are tumor specific, and to a certain extent specific for defined stages of tumor development. Genomic imbalances occur already in premalignant precursor lesions, i.e., before the transition to invasive disease, and their distribution is maintained in metastases, and in cell lines derived from primary tumors. These observations are consistent with the interpretation that tumor specific genomic imbalances are drivers of malignant transformation. Naturally, this precipitates the question of how such imbalances influence the expression of resident genes. A number of laboratories have systematically integrated copy number alterations with gene expression changes in primary tumors and metastases, cell lines, and experimental models of aneuploidy to address the question as to whether genomic imbalances deregulate the expression of one or few key genes, or rather affect the cancer transcriptome more globally. The majority of these studies showed that gene expression levels follow genomic copy number. Therefore, gross genomic copy number changes, including aneuploidies of entire chromosome arms and chromosomes, result in a massive deregulation of the transcriptome of cancer cells. This article is part of a Special Issue entitled: Chromatin in time and space. PMID:22426433

  10. GenomeDiagram: a python package for the visualization of large-scale genomic data.

    PubMed

    Pritchard, Leighton; White, Jennifer A; Birch, Paul R J; Toth, Ian K

    2006-03-01

    We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.

  11. Status of duckweed genomics and transcriptomics.

    PubMed

    Wang, W; Messing, J

    2015-01-01

    Duckweeds belong to the smallest flowering plants that undergo fast vegetative growth in an aquatic environment. They are commonly used in wastewater treatment and animal feed. Whereas duckweeds have been studied at the biochemical level, their reduced morphology and wide environmental adaption had not been subjected to molecular analysis until recently. Here, we review the progress that has been made in using a DNA barcode system and the sequences of chloroplast and mitochondrial genomes to identify duckweed species at the species or population level. We also review analysis of the nuclear genome sequence of Spirodela that provides new insights into fundamental biological questions. Indeed, reduced gene families and missing genes are consistent with its compact morphogenesis, aquatic floating and suppression of juvenile-to-adult transition. Furthermore, deep RNA sequencing of Spirodela at the onset of dormancy and Landoltia in exposure of nutrient deficiency illustrate the molecular network for environmental adaption and stress response, constituting major progress towards a post-genome sequencing phase, where further functional genomic details can be explored. Rapid advances in sequencing technologies could continue to promote a proliferation of genome sequences for additional ecotypes as well as for other duckweed species. © 2014 German Botanical Society and The Royal Botanical Society of the Netherlands.

  12. Genomic and transcriptomic heterogeneity of colorectal tumours arising in Lynch syndrome.

    PubMed

    Binder, Hans; Hopp, Lydia; Schweiger, Michal R; Hoffmann, Steve; Jühling, Frank; Kerick, Martin; Timmermann, Bernd; Siebert, Susann; Grimm, Christina; Nersisyan, Lilit; Arakelyan, Arsen; Herberg, Maria; Buske, Peter; Loeffler-Wirth, Henry; Rosolowski, Maciej; Engel, Christoph; Przybilla, Jens; Peifer, Martin; Friedrichs, Nicolaus; Moeslein, Gabriela; Odenthal, Margarete; Hussong, Michelle; Peters, Sophia; Holzapfel, Stefanie; Nattermann, Jacob; Hueneburg, Robert; Schmiegel, Wolff; Royer-Pokora, Brigitte; Aretz, Stefan; Kloth, Michael; Kloor, Matthias; Buettner, Reinhard; Galle, Jörg; Loeffler, Markus

    2017-10-01

    Colorectal cancer (CRC) arising in Lynch syndrome (LS) comprises tumours with constitutional mutations in DNA mismatch repair genes. There is still a lack of whole-genome and transcriptome studies of LS-CRC to address questions about similarities and differences in mutation and gene expression characteristics between LS-CRC and sporadic CRC, about the molecular heterogeneity of LS-CRC, and about specific mechanisms of LS-CRC genesis linked to dysfunctional mismatch repair in LS colonic mucosa and the possible role of immune editing. Here, we provide a first molecular characterization of LS tumours and of matched tumour-distant reference colonic mucosa based on whole-genome DNA-sequencing and RNA-sequencing analyses. Our data support two subgroups of LS-CRCs, G1 and G2, whereby G1 tumours show a higher number of somatic mutations, a higher amount of microsatellite slippage, and a different mutation spectrum. The gene expression phenotypes support this difference. Reference mucosa of G1 shows a strong immune response associated with the expression of HLA and immune checkpoint genes and the invasion of CD4+ T cells. Such an immune response is not observed in LS tumours, G2 reference and normal (non-Lynch) mucosa, and sporadic CRC. We hypothesize that G1 tumours are edited for escape from a highly immunogenic microenvironment via loss of HLA presentation and T-cell exhaustion. In contrast, G2 tumours seem to develop in a less immunogenic microenvironment where tumour-promoting inflammation parallels tumourigenesis. Larger studies on non-neoplastic mucosa tissue of mutation carriers are required to better understand the early phases of emerging tumours. Copyright © 2017 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd. Copyright © 2017 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

  13. Characterizing the developmental transcriptome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae) through comparative genomic analysis with Drosophila melanogaster utilizing modENCODE datasets.

    PubMed

    Geib, Scott M; Calla, Bernarda; Hall, Brian; Hou, Shaobin; Manoukis, Nicholas C

    2014-10-28

    The oriental fruit fly, Bactrocera dorsalis, is an important pest of fruit and vegetable crops throughout Asia, and is considered a high risk pest for establishment in the mainland United States. It is a member of the family Tephritidae, which are the most agriculturally important family of flies, and can be considered an out-group to well-studied members of the family Drosophilidae. Despite their importance as pests and their relatedness to Drosophila, little information is present on B. dorsalis transcripts and proteins. The objective of this paper is to comprehensively characterize the transcripts present throughout the life history of B. dorsalis and functionally annotate and analyse these transcripts relative to the presence, expression, and function of orthologous sequences present in Drosophila melanogaster. We present a detailed transcriptome assembly of B. dorsalis from egg through adult stages containing 20,666 transcripts across 10,799 unigene components. Utilizing data available through Flybase and the modENCODE project, we compared expression patterns of these transcripts to putative orthologs in D. melanogaster in terms of timing, abundance, and function. In addition, temporal expression patterns in B. dorsalis were characterized between stages, to establish the constitutive or stage-specific expression patterns of particular transcripts. A fully annotated transcriptome assembly is made available through NCBI, in addition to corresponding expression data. Through characterizing the transcriptome of B. dorsalis through its life history and comparing the transcriptome of B. dorsalis to the model organism D. melanogaster, a database has been developed that can be used as the foundation to functional genomic research in Bactrocera flies and help identify orthologous genes between B. dorsalis and D. melanogaster. This data provides the foundation for future functional genomic research that will focus on improving our understanding of the physiology and

  14. Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.

    PubMed

    Mahesh, Hirehally Basavarajegowda; Subba, Pratigya; Advani, Jayshree; Shirke, Meghana Deepak; Loganathan, Ramya Malarini; Chandana, Shankara Lingu; Shilpa, Siddappa; Chatterjee, Oishi; Pinto, Sneha Maria; Prasad, Thottethodi Subrahmanya Keshava; Gowda, Malali

    2018-04-01

    Indian sandalwood ( Santalum album ) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees. © 2018 American Society of Plant Biologists. All Rights Reserved.

  15. Reefgenomics.Org - a repository for marine genomics data.

    PubMed

    Liew, Yi Jin; Aranda, Manuel; Voolstra, Christian R

    2016-01-01

    Over the last decade, technological advancements have substantially decreased the cost and time of obtaining large amounts of sequencing data. Paired with the exponentially increased computing power, individual labs are now able to sequence genomes or transcriptomes to investigate biological questions of interest. This has led to a significant increase in available sequence data. Although the bulk of data published in articles are stored in public sequence databases, very often, only raw sequencing data are available; miscellaneous data such as assembled transcriptomes, genome annotations etc. are not easily obtainable through the same means. Here, we introduce our website (http://reefgenomics.org) that aims to centralize genomic and transcriptomic data from marine organisms. Besides providing convenient means to download sequences, we provide (where applicable) a genome browser to explore available genomic features, and a BLAST interface to search through the hosted sequences. Through the interface, multiple datasets can be queried simultaneously, allowing for the retrieval of matching sequences from organisms of interest. The minimalistic, no-frills interface reduces visual clutter, making it convenient for end-users to search and explore processed sequence data. DATABASE URL: http://reefgenomics.org. © The Author(s) 2016. Published by Oxford University Press.

  16. Omics and Environmental Science Genomic Approaches With Natural Fish Populations From Polluted Environments

    PubMed Central

    Bozinovic, Goran; Oleksiak, Marjorie F.

    2010-01-01

    Transcriptomics and population genomics are two complementary genomic approaches that can be used to gain insight into pollutant effects in natural populations. Transcriptomics identify altered gene expression pathways while population genomics approaches more directly target the causative genomic polymorphisms. Neither approach is restricted to a pre-determined set of genes or loci. Instead, both approaches allow a broad overview of genomic processes. Transcriptomics and population genomic approaches have been used to explore genomic responses in populations of fish from polluted environments and have identified sets of candidate genes and loci that appear biologically important in response to pollution. Often differences in gene expression or loci between polluted and reference populations are not conserved among polluted populations suggesting a biological complexity that we do not yet fully understand. As genomic approaches become less expensive with the advent of new sequencing and genotyping technologies, they will be more widely used in complimentary studies. However, while these genomic approaches are immensely powerful for identifying candidate gene and loci, the challenge of determining biological mechanisms that link genotypes and phenotypes remains. PMID:21072843

  17. Genomics and Weeds: A Synthesis

    USDA-ARS?s Scientific Manuscript database

    Genomics can be used to solve many problems associated with the management of weeds. New target sites for herbicides have been discovered through functional genomic approaches to determine gene function. Modes of action of herbicides can be clarified or discovered by transcriptome analysis. Under...

  18. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.)

    USDA-ARS?s Scientific Manuscript database

    This study reports generation of large-scale genomic resources for pigeonpea, a so-called ‘orphan crop species’ of the semi-arid tropic regions. Roche FLX/454 sequencing was carried out on a normalized cDNA pool prepared from 31 tissues produced 494,353 short transcript reads (STRs). Cluster analysi...

  19. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species.

    PubMed

    Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan

    2012-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.

  20. A New Model Army: Emerging fish models to study the genomics of vertebrate Evo-Devo

    PubMed Central

    Braasch, Ingo; Peterson, Samuel M.; Desvignes, Thomas; McCluskey, Braedan M.; Batzel, Peter; Postlethwait, John H.

    2014-01-01

    Many fields of biology – including vertebrate Evo-Devo research – are facing an explosion of genomic and transcriptomic sequence information and a multitude of fish species are now swimming in this ‘genomic tsunami’. Here, we first give an overview of recent developments in sequencing fish genomes and transcriptomes that identify properties of fish genomes requiring particular attention and propose strategies to overcome common challenges in fish genomics. We suggest that the generation of chromosome-level genome assemblies - for which we introduce the term ‘chromonome’ – should be a key component of genomic investigations in fish because they enable large-scale conserved synteny analyses that inform orthology detection, a process critical for connectivity of genomes. Orthology calls in vertebrates, especially in teleost fish, are complicated by divergent evolution of gene repertoires and functions following two rounds of genome duplication in the ancestor of vertebrates and a third round at the base of teleost fish. Second, using examples of spotted gar, basal teleosts, zebrafish-related cyprinids, cavefish, livebearers, icefish, and lobefin fish, we illustrate how next generation sequencing technologies liberate emerging fish systems from genomic ignorance and transform them into a new model army to answer longstanding questions on the genomic and developmental basis of their biodiversity. Finally, we discuss recent progress in the genetic toolbox for the major fish models for functional analysis, zebrafish and medaka, that can be transferred to many other fish species to study in vivo the functional effect of evolutionary genomic change as Evo-Devo research enters the postgenomic era. PMID:25111899

  1. Isoform Sequencing Provides a More Comprehensive View of the Panax ginseng Transcriptome.

    PubMed

    Jo, Ick-Hyun; Lee, Jinsu; Hong, Chi Eun; Lee, Dong Jin; Bae, Wonsil; Park, Sin-Gi; Ahn, Yong Ju; Kim, Young Chang; Kim, Jang Uk; Lee, Jung Woo; Hyun, Dong Yun; Rhee, Sung-Keun; Hong, Chang Pyo; Bang, Kyong Hwan; Ryu, Hojin

    2017-09-15

    Korean ginseng ( Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng , we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana . Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng . In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.

  2. Extraction of Molecular Features through Exome to Transcriptome Alignment

    PubMed Central

    Mudvari, Prakriti; Kowsari, Kamran; Cole, Charles; Mazumder, Raja; Horvath, Anelia

    2014-01-01

    Integrative Next Generation Sequencing (NGS) DNA and RNA analyses have very recently become feasible, and the published to date studies have discovered critical disease implicated pathways, and diagnostic and therapeutic targets. A growing number of exomes, genomes and transcriptomes from the same individual are quickly accumulating, providing unique venues for mechanistic and regulatory features analysis, and, at the same time, requiring new exploration strategies. In this study, we have integrated variation and expression information of four NGS datasets from the same individual: normal and tumor breast exomes and transcriptomes. Focusing on SNPcentered variant allelic prevalence, we illustrate analytical algorithms that can be applied to extract or validate potential regulatory elements, such as expression or growth advantage, imprinting, loss of heterozygosity (LOH), somatic changes, and RNA editing. In addition, we point to some critical elements that might bias the output and recommend alternative measures to maximize the confidence of findings. The need for such strategies is especially recognized within the growing appreciation of the concept of systems biology: integrative exploration of genome and transcriptome features reveal mechanistic and regulatory insights that reach far beyond linear addition of the individual datasets. PMID:24791251

  3. Transcriptome sequencing reveals high isoform diversity in the ant Formica exsecta

    PubMed Central

    Paviala, Jenni; Morandin, Claire; Wheat, Christopher; Sundström, Liselotte; Helanterä, Heikki

    2017-01-01

    Transcriptome resources for social insects have the potential to provide new insight into polyphenism, i.e., how divergent phenotypes arise from the same genome. Here we present a transcriptome based on paired-end RNA sequencing data for the ant Formica exsecta (Formicidae, Hymenoptera). The RNA sequencing libraries were constructed from samples of several life stages of both sexes and female castes of queens and workers, in order to maximize representation of expressed genes. We first compare the performance of common assembly and scaffolding software (Trinity, Velvet-Oases, and SOAPdenovo-trans), in producing de novo assemblies. Second, we annotate the resulting expressed contigs to the currently published genomes of ants, and other insects, including the honeybee, to filter genes that have annotation evidence of being true genes. Our pipeline resulted in a final assembly of altogether 39,262 mRNA transcripts, with an average coverage of >300X, belonging to 17,496 unique genes with annotation in the related ant species. From these genes, 536 genes were unique to one caste or sex only, highlighting the importance of comprehensive sampling. Our final assembly also showed expression of several splice variants in 6,975 genes, and we show that accounting for splice variants affects the outcome of downstream analyses such as gene ontologies. Our transcriptome provides an outstanding resource for future genetic studies on F. exsecta and other ant species, and the presented transcriptome assembly can be adapted to any non-model species that has genomic resources available from a related taxon. PMID:29177112

  4. Adaptation and evolution of deep-sea scale worms (Annelida: Polynoidae): insights from transcriptome comparison with a shallow-water species

    NASA Astrophysics Data System (ADS)

    Zhang, Yanjie; Sun, Jin; Chen, Chong; Watanabe, Hiromi K.; Feng, Dong; Zhang, Yu; Chiu, Jill M. Y.; Qian, Pei-Yuan; Qiu, Jian-Wen

    2017-04-01

    Polynoid scale worms (Polynoidae, Annelida) invaded deep-sea chemosynthesis-based ecosystems approximately 60 million years ago, but little is known about their genetic adaptation to the extreme deep-sea environment. In this study, we reported the first two transcriptomes of deep-sea polynoids (Branchipolynoe pettiboneae, Lepidonotopodium sp.) and compared them with the transcriptome of a shallow-water polynoid (Harmothoe imbricata). We determined codon and amino acid usage, positive selected genes, highly expressed genes and putative duplicated genes. Transcriptome assembly produced 98,806 to 225,709 contigs in the three species. There were more positively charged amino acids (i.e., histidine and arginine) and less negatively charged amino acids (i.e., aspartic acid and glutamic acid) in the deep-sea species. There were 120 genes showing clear evidence of positive selection. Among the 10% most highly expressed genes, there were more hemoglobin genes with high expression levels in both deep-sea species. The duplicated genes related to DNA recombination and metabolism, and gene expression were only enriched in deep-sea species. Deep-sea scale worms adopted two strategies of adaptation to hypoxia in the chemosynthesis-based habitats (i.e., rapid evolution of tetra-domain hemoglobin in Branchipolynoe or high expression of single-domain hemoglobin in Lepidonotopodium sp.).

  5. Adaptation and evolution of deep-sea scale worms (Annelida: Polynoidae): insights from transcriptome comparison with a shallow-water species

    PubMed Central

    Zhang, Yanjie; Sun, Jin; Chen, Chong; Watanabe, Hiromi K.; Feng, Dong; Zhang, Yu; Chiu, Jill M.Y.; Qian, Pei-Yuan; Qiu, Jian-Wen

    2017-01-01

    Polynoid scale worms (Polynoidae, Annelida) invaded deep-sea chemosynthesis-based ecosystems approximately 60 million years ago, but little is known about their genetic adaptation to the extreme deep-sea environment. In this study, we reported the first two transcriptomes of deep-sea polynoids (Branchipolynoe pettiboneae, Lepidonotopodium sp.) and compared them with the transcriptome of a shallow-water polynoid (Harmothoe imbricata). We determined codon and amino acid usage, positive selected genes, highly expressed genes and putative duplicated genes. Transcriptome assembly produced 98,806 to 225,709 contigs in the three species. There were more positively charged amino acids (i.e., histidine and arginine) and less negatively charged amino acids (i.e., aspartic acid and glutamic acid) in the deep-sea species. There were 120 genes showing clear evidence of positive selection. Among the 10% most highly expressed genes, there were more hemoglobin genes with high expression levels in both deep-sea species. The duplicated genes related to DNA recombination and metabolism, and gene expression were only enriched in deep-sea species. Deep-sea scale worms adopted two strategies of adaptation to hypoxia in the chemosynthesis-based habitats (i.e., rapid evolution of tetra-domain hemoglobin in Branchipolynoe or high expression of single-domain hemoglobin in Lepidonotopodium sp.). PMID:28397791

  6. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    PubMed Central

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818

  7. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    PubMed

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  8. Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.

    PubMed

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-05-12

    A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits

  9. Breast cancer genome and transcriptome integration implicates specific mutational signatures with immune cell infiltration

    PubMed Central

    Smid, Marcel; Rodríguez-González, F. Germán; Sieuwerts, Anieta M.; Salgado, Roberto; Prager-Van der Smissen, Wendy J. C.; Vlugt-Daane, Michelle van der; van Galen, Anne; Nik-Zainal, Serena; Staaf, Johan; Brinkman, Arie B.; van de Vijver, Marc J.; Richardson, Andrea L.; Fatima, Aquila; Berentsen, Kim; Butler, Adam; Martin, Sancha; Davies, Helen R.; Debets, Reno; Gelder, Marion E. Meijer-Van; van Deurzen, Carolien H. M.; MacGrogan, Gaëtan; Van den Eynden, Gert G. G. M.; Purdie, Colin; Thompson, Alastair M.; Caldas, Carlos; Span, Paul N.; Simpson, Peter T.; Lakhani, Sunil R.; Van Laere, Steven; Desmedt, Christine; Ringnér, Markus; Tommasi, Stefania; Eyford, Jorunn; Broeks, Annegien; Vincent-Salomon, Anne; Futreal, P. Andrew; Knappskog, Stian; King, Tari; Thomas, Gilles; Viari, Alain; Langerød, Anita; Børresen-Dale, Anne-Lise; Birney, Ewan; Stunnenberg, Hendrik G.; Stratton, Mike; Foekens, John A.; Martens, John W. M.

    2016-01-01

    A recent comprehensive whole genome analysis of a large breast cancer cohort was used to link known and novel drivers and substitution signatures to the transcriptome of 266 cases. Here, we validate that subtype-specific aberrations show concordant expression changes for, for example, TP53, PIK3CA, PTEN, CCND1 and CDH1. We find that CCND3 expression levels do not correlate with amplification, while increased GATA3 expression in mutant GATA3 cancers suggests GATA3 is an oncogene. In luminal cases the total number of substitutions, irrespective of type, associates with cell cycle gene expression and adverse outcome, whereas the number of mutations of signatures 3 and 13 associates with immune-response specific gene expression, increased numbers of tumour-infiltrating lymphocytes and better outcome. Thus, while earlier reports imply that the sheer number of somatic aberrations could trigger an immune-response, our data suggests that substitutions of a particular type are more effective in doing so than others. PMID:27666519

  10. The Long Noncoding RNA Transcriptome of Dictyostelium discoideum Development.

    PubMed

    Rosengarten, Rafael D; Santhanam, Balaji; Kokosar, Janez; Shaulsky, Gad

    2017-02-09

    Dictyostelium discoideum live in the soil as single cells, engulfing bacteria and growing vegetatively. Upon starvation, tens of thousands of amoebae enter a developmental program that includes aggregation, multicellular differentiation, and sporulation. Major shifts across the protein-coding transcriptome accompany these developmental changes. However, no study has presented a global survey of long noncoding RNAs (ncRNAs) in D. discoideum To characterize the antisense and long intergenic noncoding RNA (lncRNA) transcriptome, we analyzed previously published developmental time course samples using an RNA-sequencing (RNA-seq) library preparation method that selectively depletes ribosomal RNAs (rRNAs). We detected the accumulation of transcripts for 9833 protein-coding messenger RNAs (mRNAs), 621 lncRNAs, and 162 putative antisense RNAs (asRNAs). The noncoding RNAs were interspersed throughout the genome, and were distinct in expression level, length, and nucleotide composition. The noncoding transcriptome displayed a temporal profile similar to the coding transcriptome, with stages of gradual change interspersed with larger leaps. The transcription profiles of some noncoding RNAs were strongly correlated with known differentially expressed coding RNAs, hinting at a functional role for these molecules during development. Examining the mitochondrial transcriptome, we modeled two novel antisense transcripts. We applied yet another ribosomal depletion method to a subset of the samples to better retain transfer RNA (tRNA) transcripts. We observed polymorphisms in tRNA anticodons that suggested a post-transcriptional means by which D. discoideum compensates for codons missing in the genomic complement of tRNAs. We concluded that the prevalence and characteristics of long ncRNAs indicate that these molecules are relevant to the progression of molecular and cellular phenotypes during development. Copyright © 2017 Rosengarten et al.

  11. Scalable Parameter Estimation for Genome-Scale Biochemical Reaction Networks

    PubMed Central

    Kaltenbacher, Barbara; Hasenauer, Jan

    2017-01-01

    Mechanistic mathematical modeling of biochemical reaction networks using ordinary differential equation (ODE) models has improved our understanding of small- and medium-scale biological processes. While the same should in principle hold for large- and genome-scale processes, the computational methods for the analysis of ODE models which describe hundreds or thousands of biochemical species and reactions are missing so far. While individual simulations are feasible, the inference of the model parameters from experimental data is computationally too intensive. In this manuscript, we evaluate adjoint sensitivity analysis for parameter estimation in large scale biochemical reaction networks. We present the approach for time-discrete measurement and compare it to state-of-the-art methods used in systems and computational biology. Our comparison reveals a significantly improved computational efficiency and a superior scalability of adjoint sensitivity analysis. The computational complexity is effectively independent of the number of parameters, enabling the analysis of large- and genome-scale models. Our study of a comprehensive kinetic model of ErbB signaling shows that parameter estimation using adjoint sensitivity analysis requires a fraction of the computation time of established methods. The proposed method will facilitate mechanistic modeling of genome-scale cellular processes, as required in the age of omics. PMID:28114351

  12. Genome-scale engineering for systems and synthetic biology

    PubMed Central

    Esvelt, Kevin M; Wang, Harris H

    2013-01-01

    Genome-modification technologies enable the rational engineering and perturbation of biological systems. Historically, these methods have been limited to gene insertions or mutations at random or at a few pre-defined locations across the genome. The handful of methods capable of targeted gene editing suffered from low efficiencies, significant labor costs, or both. Recent advances have dramatically expanded our ability to engineer cells in a directed and combinatorial manner. Here, we review current technologies and methodologies for genome-scale engineering, discuss the prospects for extending efficient genome modification to new hosts, and explore the implications of continued advances toward the development of flexibly programmable chasses, novel biochemistries, and safer organismal and ecological engineering. PMID:23340847

  13. The developmental transcriptome of Drosophila melanogaster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    University of Connecticut; Graveley, Brenton R.; Brooks, Angela N.

    Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, predictionmore » and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development. Drosophila melanogaster is an important non-mammalian model system that has had a critical role in basic biological discoveries, such as identifying chromosomes as the carriers of genetic information and uncovering the role of genes in development. Because it shares a substantial genic content with humans, Drosophila is increasingly used as a translational model for human development, homeostasis and disease. High-quality maps are needed for all functional genomic elements. Previous studies demonstrated that a rich collection of genes is deployed during the life cycle of the fly. Although expression profiling using microarrays has revealed the expression of, 13,000 annotated genes, it is difficult to map splice junctions and individual base modifications generated by RNA editing using such approaches. Single-base resolution is essential to define precisely the elements that comprise the Drosophila transcriptome. Estimates of the number of transcript isoforms are less accurate than estimates of the number

  14. Genome-Wide Fine-Scale Recombination Rate Variation in Drosophila melanogaster

    PubMed Central

    Song, Yun S.

    2012-01-01

    Estimating fine-scale recombination maps of Drosophila from population genomic data is a challenging problem, in particular because of the high background recombination rate. In this paper, a new computational method is developed to address this challenge. Through an extensive simulation study, it is demonstrated that the method allows more accurate inference, and exhibits greater robustness to the effects of natural selection and noise, compared to a well-used previous method developed for studying fine-scale recombination rate variation in the human genome. As an application, a genome-wide analysis of genetic variation data is performed for two Drosophila melanogaster populations, one from North America (Raleigh, USA) and the other from Africa (Gikongoro, Rwanda). It is shown that fine-scale recombination rate variation is widespread throughout the D. melanogaster genome, across all chromosomes and in both populations. At the fine-scale, a conservative, systematic search for evidence of recombination hotspots suggests the existence of a handful of putative hotspots each with at least a tenfold increase in intensity over the background rate. A wavelet analysis is carried out to compare the estimated recombination maps in the two populations and to quantify the extent to which recombination rates are conserved. In general, similarity is observed at very broad scales, but substantial differences are seen at fine scales. The average recombination rate of the X chromosome appears to be higher than that of the autosomes in both populations, and this pattern is much more pronounced in the African population than the North American population. The correlation between various genomic features—including recombination rates, diversity, divergence, GC content, gene content, and sequence quality—is examined using the wavelet analysis, and it is shown that the most notable difference between D. melanogaster and humans is in the correlation between recombination and

  15. Transcriptome complexity in cardiac development and diseases--an expanding universe between genome and phenome.

    PubMed

    Gao, Chen; Wang, Yibin

    2014-01-01

    With the advancement of transcriptome profiling by micro-arrays and high-throughput RNA-sequencing, transcriptome complexity and its dynamics are revealed at different levels in cardiovascular development and diseases. In this review, we will highlight the recent progress in our knowledge of cardiovascular transcriptome complexity contributed by RNA splicing, RNA editing and noncoding RNAs. The emerging importance of many of these previously under-explored aspects of gene regulation in cardiovascular development and pathology will be discussed.

  16. Assembled contigs of the synganglion transcriptome from an Australian population of the cattle tick, Rhipicephalus microplus

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources and University of Texas at El Paso researchers, we sequenced and assembled the transcriptome of the synganglion of the Texas strain (Deutsch) of the cattle tick Rhipicephalus microplus. This transcriptome contains 43, 468 sequences and wa...

  17. Assembled contigs of the synganglion transcriptome from a Texas population of the cattle tick, Rhipicephalus microplus.

    USDA-ARS?s Scientific Manuscript database

    In a collaboration with National Center for Genome Resources and University of Texas at El Paso researchers, we sequenced and assembled the transcriptome of the synganglion of the Texas strain (Deutsch) of the cattle tick Rhipicephalus microplus. This transcriptome contains 43, 468 sequences and wa...

  18. Genomics and functional genomics in Chlamydomonas reinhardtii

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blaby, Ian K.; Blaby-Haas, Crysten E.

    The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less

  19. Genomics and functional genomics in Chlamydomonas reinhardtii

    DOE PAGES

    Blaby, Ian K.; Blaby-Haas, Crysten E.

    2017-03-21

    The availability of the Chlamydomonas reinhardtii nuclear genome sequence continues to enable researchers to address biological questions relevant to algae, land plants and animals in unprecedented ways. As we continue to characterize and understand biological processes in C. reinhardtii and translate that knowledge to other systems, we are faced with the realization that many genes encode proteins without a defined function. The field of functional genomics aims to close this gap between genome sequence and protein function. Transcriptomes, proteomes and phenomes can each provide layers of gene-specific functional data while supplying a global snapshot of cellular behavior under different conditions.more » Herein we present a brief history of functional genomics, the present status of the C. reinhardtii genome, how genome-wide experiments can aid in supplying protein function inferences, and provide an outlook for functional genomics in C. reinhardtii.« less

  20. Assessing the Metabolic Impact of Nitrogen Availability Using a Compartmentalized Maize Leaf Genome-Scale Model1[C][W][OPEN

    PubMed Central

    Simons, Margaret; Saha, Rajib; Amiour, Nardjis; Kumar, Akhil; Guillard, Lenaïg; Clément, Gilles; Miquel, Martine; Li, Zhenni; Mouille, Gregory; Lea, Peter J.; Hirel, Bertrand; Maranas, Costas D.

    2014-01-01

    Maize (Zea mays) is an important C4 plant due to its widespread use as a cereal and energy crop. A second-generation genome-scale metabolic model for the maize leaf was created to capture C4 carbon fixation and investigate nitrogen (N) assimilation by modeling the interactions between the bundle sheath and mesophyll cells. The model contains gene-protein-reaction relationships, elemental and charge-balanced reactions, and incorporates experimental evidence pertaining to the biomass composition, compartmentalization, and flux constraints. Condition-specific biomass descriptions were introduced that account for amino acids, fatty acids, soluble sugars, proteins, chlorophyll, lignocellulose, and nucleic acids as experimentally measured biomass constituents. Compartmentalization of the model is based on proteomic/transcriptomic data and literature evidence. With the incorporation of information from the MetaCrop and MaizeCyc databases, this updated model spans 5,824 genes, 8,525 reactions, and 9,153 metabolites, an increase of approximately 4 times the size of the earlier iRS1563 model. Transcriptomic and proteomic data have also been used to introduce regulatory constraints in the model to simulate an N-limited condition and mutants deficient in glutamine synthetase, gln1-3 and gln1-4. Model-predicted results achieved 90% accuracy when comparing the wild type grown under an N-complete condition with the wild type grown under an N-deficient condition. PMID:25248718

  1. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    PubMed

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  2. Aquatic Plant Genomics: Advances, Applications, and Prospects

    PubMed Central

    Li, Gaojie; Yang, Jingjing

    2017-01-01

    Genomics is a discipline in genetics that studies the genome composition of organisms and the precise structure of genes and their expression and regulation. Genomics research has resolved many problems where other biological methods have failed. Here, we summarize advances in aquatic plant genomics with a focus on molecular markers, the genes related to photosynthesis and stress tolerance, comparative study of genomes and genome/transcriptome sequencing technology. PMID:28900619

  3. Genome-scale CRISPR-Cas9 knockout screening in human cells.

    PubMed

    Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelson, Tarjei; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng

    2014-01-03

    The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12, as well as novel hits NF2, CUL3, TADA2B, and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.

  4. Tapping the promise of genomics in species with complex, nonmodel genomes.

    PubMed

    Hirsch, Candice N; Buell, C Robin

    2013-01-01

    Genomics is enabling a renaissance in all disciplines of plant biology. However, many plant genomes are complex and remain recalcitrant to current genomic technologies. The complexities of these nonmodel plant genomes are attributable to gene and genome duplication, heterozygosity, ploidy, and/or repetitive sequences. Methods are available to simplify the genome and reduce these barriers, including inbreeding and genome reduction, making these species amenable to current sequencing and assembly methods. Some, but not all, of the complexities in nonmodel genomes can be bypassed by sequencing the transcriptome rather than the genome. Additionally, comparative genomics approaches, which leverage phylogenetic relatedness, can aid in the interpretation of complex genomes. Although there are limitations in accessing complex nonmodel plant genomes using current sequencing technologies, genome manipulation and resourceful analyses can allow access to even the most recalcitrant plant genomes.

  5. Comprehensive analysis of RNA-seq data reveals the complexity of the transcriptome in Brassica rapa.

    PubMed

    Tong, Chaobo; Wang, Xiaowu; Yu, Jingyin; Wu, Jian; Li, Wanshun; Huang, Junyan; Dong, Caihua; Hua, Wei; Liu, Shengyi

    2013-10-07

    The species Brassica rapa (2n=20, AA) is an important vegetable and oilseed crop, and serves as an excellent model for genomic and evolutionary research in Brassica species. With the availability of whole genome sequence of B. rapa, it is essential to further determine the activity of all functional elements of the B. rapa genome and explore the transcriptome on a genome-wide scale. Here, RNA-seq data was employed to provide a genome-wide transcriptional landscape and characterization of the annotated and novel transcripts and alternative splicing events across tissues. RNA-seq reads were generated using the Illumina platform from six different tissues (root, stem, leaf, flower, silique and callus) of the B. rapa accession Chiifu-401-42, the same line used for whole genome sequencing. First, these data detected the widespread transcription of the B. rapa genome, leading to the identification of numerous novel transcripts and definition of 5'/3' UTRs of known genes. Second, 78.8% of the total annotated genes were detected as expressed and 45.8% were constitutively expressed across all tissues. We further defined several groups of genes: housekeeping genes, tissue-specific expressed genes and co-expressed genes across tissues, which will serve as a valuable repository for future crop functional genomics research. Third, alternative splicing (AS) is estimated to occur in more than 29.4% of intron-containing B. rapa genes, and 65% of them were commonly detected in more than two tissues. Interestingly, genes with high rate of AS were over-represented in GO categories relating to transcriptional regulation and signal transduction, suggesting potential importance of AS for playing regulatory role in these genes. Further, we observed that intron retention (IR) is predominant in the AS events and seems to preferentially occurred in genes with short introns. The high-resolution RNA-seq analysis provides a global transcriptional landscape as a complement to the B. rapa genome

  6. The Co-regulation Data Harvester: Automating gene annotation starting from a transcriptome database

    NASA Astrophysics Data System (ADS)

    Tsypin, Lev M.; Turkewitz, Aaron P.

    Identifying co-regulated genes provides a useful approach for defining pathway-specific machinery in an organism. To be efficient, this approach relies on thorough genome annotation, a process much slower than genome sequencing per se. Tetrahymena thermophila, a unicellular eukaryote, has been a useful model organism and has a fully sequenced but sparsely annotated genome. One important resource for studying this organism has been an online transcriptomic database. We have developed an automated approach to gene annotation in the context of transcriptome data in T. thermophila, called the Co-regulation Data Harvester (CDH). Beginning with a gene of interest, the CDH identifies co-regulated genes by accessing the Tetrahymena transcriptome database. It then identifies their closely related genes (orthologs) in other organisms by using reciprocal BLAST searches. Finally, it collates the annotations of those orthologs' functions, which provides the user with information to help predict the cellular role of the initial query. The CDH, which is freely available, represents a powerful new tool for analyzing cell biological pathways in Tetrahymena. Moreover, to the extent that genes and pathways are conserved between organisms, the inferences obtained via the CDH should be relevant, and can be explored, in many other systems.

  7. Ginseng Genome Database: an open-access platform for genomics of Panax ginseng.

    PubMed

    Jayakodi, Murukarthick; Choi, Beom-Soon; Lee, Sang-Choon; Kim, Nam-Hoon; Park, Jee Young; Jang, Woojong; Lakshmanan, Meiyappan; Mohan, Shobhana V G; Lee, Dong-Yup; Yang, Tae-Jin

    2018-04-12

    The ginseng (Panax ginseng C.A. Meyer) is a perennial herbaceous plant that has been used in traditional oriental medicine for thousands of years. Ginsenosides, which have significant pharmacological effects on human health, are the foremost bioactive constituents in this plant. Having realized the importance of this plant to humans, an integrated omics resource becomes indispensable to facilitate genomic research, molecular breeding and pharmacological study of this herb. The first draft genome sequences of P. ginseng cultivar "Chunpoong" were reported recently. Here, using the draft genome, transcriptome, and functional annotation datasets of P. ginseng, we have constructed the Ginseng Genome Database http://ginsengdb.snu.ac.kr /, the first open-access platform to provide comprehensive genomic resources of P. ginseng. The current version of this database provides the most up-to-date draft genome sequence (of approximately 3000 Mbp of scaffold sequences) along with the structural and functional annotations for 59,352 genes and digital expression of genes based on transcriptome data from different tissues, growth stages and treatments. In addition, tools for visualization and the genomic data from various analyses are provided. All data in the database were manually curated and integrated within a user-friendly query page. This database provides valuable resources for a range of research fields related to P. ginseng and other species belonging to the Apiales order as well as for plant research communities in general. Ginseng genome database can be accessed at http://ginsengdb.snu.ac.kr /.

  8. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  9. Expanding frontiers in plant transcriptomics in aid of functional genomics and molecular breeding.

    PubMed

    Agarwal, Pinky; Parida, Swarup K; Mahto, Arunima; Das, Sweta; Mathew, Iny Elizebeth; Malik, Naveen; Tyagi, Akhilesh K

    2014-12-01

    The transcript pool of a plant part, under any given condition, is a collection of mRNAs that will pave the way for a biochemical reaction of the plant to stimuli. Over the past decades, transcriptome study has advanced from Northern blotting to RNA sequencing (RNA-seq), through other techniques, of which real-time quantitative polymerase chain reaction (PCR) and microarray are the most significant ones. The questions being addressed by such studies have also matured from a solitary process to expression atlas and marker-assisted genetic enhancement. Not only genes and their networks involved in various developmental processes of plant parts have been elucidated, but also stress tolerant genes have been highlighted. The transcriptome of a plant with altered expression of a target gene has given information about the downstream genes. Marker information has been used for breeding improved varieties. Fortunately, the data generated by transcriptome analysis has been made freely available for ample utilization and comparison. The review discusses this wide variety of transcriptome data being generated in plants, which includes developmental stages, abiotic and biotic stress, effect of altered gene expression, as well as comparative transcriptomics, with a special emphasis on microarray and RNA-seq. Such data can be used to determine the regulatory gene networks, which can subsequently be utilized for generating improved plant varieties. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

    PubMed Central

    Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629

  11. Comparative genomic and transcriptomic analyses of the Fuzhuan brick tea-fermentation fungus Aspergillus cristatus.

    PubMed

    Ge, Yongyi; Wang, Yuchen; Liu, YongXiang; Tan, Yumei; Ren, Xiuxiu; Zhang, Xinyu; Hyde, Kevin D; Liu, Yongfeng; Liu, Zuoyi

    2016-06-07

    Aspergillus cristatus is the dominant fungus involved in the fermentation of Chinese Fuzhuan brick tea. Aspergillus cristatus is a homothallic fungus that undergoes a sexual stage without asexual conidiation when cultured in hypotonic medium. The asexual stage is induced by a high salt concentration, which completely inhibits sexual development. The taxon is therefore appropriate for investigating the mechanisms of asexual and sexual reproduction in fungi. In this study, de novo genome sequencing and analysis of transcriptomes during culture under high- and low-osmolarity conditions were performed. These analyses facilitated investigation of the evolution of mating-type genes, which determine the mode of sexual reproduction, in A. cristatus, the response of the high-osmolarity glycerol (HOG) pathway to osmotic stimulation, and the detection of mycotoxins and evaluation of the relationship with the location of the encoding genes. The A. cristatus genome comprised 27.9 Mb and included 68 scaffolds, from which 10,136 protein-coding gene models were predicted. A phylogenetic analysis suggested a considerable phylogenetic distance between A. cristatus and A. nidulans. Comparison of the mating-type gene loci among Aspergillus species indicated that the mode in A. cristatus differs from those in other Aspergillus species. The components of the HOG pathway were conserved in the genome of A. cristatus. Differential gene expression analysis in A. cristatus using RNA-Seq demonstrated that the expression of most genes in the HOG pathway was unaffected by osmotic pressure. No gene clusters associated with the production of carcinogens were detected. A model of the mating-type locus in A. cristatus is reported for the first time. Aspergillus cristatus has evolved various mechanisms to cope with high osmotic stress. As a fungus associated with Fuzhuan tea, it is considered to be safe under low- and high-osmolarity conditions.

  12. Comparative transcriptome analysis in Sclerotinia sclerotiorum and S. trifoliorum by 454 Titanium RNA sequencing

    USDA-ARS?s Scientific Manuscript database

    Sclerotinia sclerotiorum and S. trifoliorum are two closely related devastating plant pathogens. Extensive research has been conducted on S. sclerotiorum and its genome sequences are available. To take advantages of the genomic information of S. sclerotiorum, we compared the transcriptome of S. tr...

  13. Quantitative RNA-seq analysis of the Campylobacter jejuni transcriptome

    PubMed Central

    Chaudhuri, Roy R.; Yu, Lu; Kanji, Alpa; Perkins, Timothy T.; Gardner, Paul P.; Choudhary, Jyoti; Maskell, Duncan J.

    2011-01-01

    Campylobacter jejuni is the most common bacterial cause of foodborne disease in the developed world. Its general physiology and biochemistry, as well as the mechanisms enabling it to colonize and cause disease in various hosts, are not well understood, and new approaches are required to understand its basic biology. High-throughput sequencing technologies provide unprecedented opportunities for functional genomic research. Recent studies have shown that direct Illumina sequencing of cDNA (RNA-seq) is a useful technique for the quantitative and qualitative examination of transcriptomes. In this study we report RNA-seq analyses of the transcriptomes of C. jejuni (NCTC11168) and its rpoN mutant. This has allowed the identification of hitherto unknown transcriptional units, and further defines the regulon that is dependent on rpoN for expression. The analysis of the NCTC11168 transcriptome was supplemented by additional proteomic analysis using liquid chromatography-MS. The transcriptomic and proteomic datasets represent an important resource for the Campylobacter research community. PMID:21816880

  14. Global impact of RNA splicing on transcriptome remodeling in the heart.

    PubMed

    Gao, Chen; Wang, Yibin

    2012-08-01

    In the eukaryotic transcriptome, both the numbers of genes and different RNA species produced by each gene contribute to the overall complexity. These RNA species are generated by the utilization of different transcriptional initiation or termination sites, or more commonly, from different messenger RNA (mRNA) splicing events. Among the 30,000+ genes in human genome, it is estimated that more than 95% of them can generate more than one gene product via alternative RNA splicing. The protein products generated from different RNA splicing variants can have different intracellular localization, activity, or tissue-distribution. Therefore, alternative RNA splicing is an important molecular process that contributes to the overall complexity of the genome and the functional specificity and diversity among different cell types. In this review, we will discuss current efforts to unravel the full complexity of the cardiac transcriptome using a deep-sequencing approach, and highlight the potential of this technology to uncover the global impact of RNA splicing on the transcriptome during development and diseases of the heart.

  15. RNASeq-based genome annotation and identification of long-noncoding RNAs in the grapevine cultivar 'Riesling'

    USDA-ARS?s Scientific Manuscript database

    The technological advances of RNA-seq and de novo transcriptome assembly have enabled genome annotation and transcriptome profiling in heterozygous species. This is a promising approach to improving the annotation of the reference genome sequence of grapevine (Vitis vinifera L.), a species of high-l...

  16. Cloud-Scale Genomic Signals Processing for Robust Large-Scale Cancer Genomic Microarray Data Analysis.

    PubMed

    Harvey, Benjamin Simeon; Ji, Soo-Yeon

    2017-01-01

    As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of large-scale cancer genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet preprocessing and classification, there has not been a research effort that focuses on a cloud-scale distributed parallel (CSDP) separable 1-D wavelet decomposition technique for denoising through differential expression thresholding and classification of LSCG microarray data. This research presents a novel methodology that utilizes a CSDP separable 1-D method for wavelet-based transformation in order to initialize a threshold which will retain significantly expressed genes through the denoising process for robust classification of cancer patients. Additionally, the overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding for denoising was used for the classification of samples within the Global Cancer Map, Cancer Cell Line Encyclopedia, and The Cancer Genome Atlas. The results proved that separable 1-D parallel distributed wavelet denoising in the cloud and differential expression thresholding increased the computational performance and enabled the generation of higher quality LSCG microarray datasets, which led to more accurate classification results.

  17. Comparative Transcriptomics of Strawberries (Fragaria spp.) Provides Insights into Evolutionary Patterns.

    PubMed

    Qiao, Qin; Xue, Li; Wang, Qia; Sun, Hang; Zhong, Yang; Huang, Jinling; Lei, Jiajun; Zhang, Ticao

    2016-01-01

    Multiple closely related species with genomic sequences provide an ideal system for studies on comparative and evolutionary genomics, as well as the mechanism of speciation. The whole genome sequences of six strawberry species ( Fragaria spp.) have been released, which provide one of the richest genomic resources of any plant genus. In this study, we first generated seven transcriptome sequences of Fragaria species de novo , with a total of 48,557-82,537 unigenes per species. Combined with 13 other species genomes in Rosales, we reconstructed a phylogenetic tree at the genomic level. The phylogenic tree shows that Fragaria closed grouped with Rubus and the Fragaria clade is divided into three subclades. East Asian species appeared in every subclade, suggesting that the genus originated in this area at ∼7.99 Mya. Four species found in mountains of Southwest China originated at ∼3.98 Mya, suggesting that rapid speciation occurred to adapt to changing environments following the uplift of the Qinghai-Tibet Plateau. Moreover, we identified 510 very significantly positively selected genes in the cultivated species F . × ananassa genome. This set of genes was enriched in functions related to specific agronomic traits, such as carbon metabolism and plant hormone signal transduction processes, which are directly related to fruit quality and flavor. These findings illustrate comprehensive evolutionary patterns in Fragaria and the genetic basis of fruit domestication of cultivated strawberry at the genomic/transcriptomic level.

  18. Comparative Transcriptomics of Strawberries (Fragaria spp.) Provides Insights into Evolutionary Patterns

    PubMed Central

    Qiao, Qin; Xue, Li; Wang, Qia; Sun, Hang; Zhong, Yang; Huang, Jinling; Lei, Jiajun; Zhang, Ticao

    2016-01-01

    Multiple closely related species with genomic sequences provide an ideal system for studies on comparative and evolutionary genomics, as well as the mechanism of speciation. The whole genome sequences of six strawberry species (Fragaria spp.) have been released, which provide one of the richest genomic resources of any plant genus. In this study, we first generated seven transcriptome sequences of Fragaria species de novo, with a total of 48,557–82,537 unigenes per species. Combined with 13 other species genomes in Rosales, we reconstructed a phylogenetic tree at the genomic level. The phylogenic tree shows that Fragaria closed grouped with Rubus and the Fragaria clade is divided into three subclades. East Asian species appeared in every subclade, suggesting that the genus originated in this area at ∼7.99 Mya. Four species found in mountains of Southwest China originated at ∼3.98 Mya, suggesting that rapid speciation occurred to adapt to changing environments following the uplift of the Qinghai–Tibet Plateau. Moreover, we identified 510 very significantly positively selected genes in the cultivated species F. × ananassa genome. This set of genes was enriched in functions related to specific agronomic traits, such as carbon metabolism and plant hormone signal transduction processes, which are directly related to fruit quality and flavor. These findings illustrate comprehensive evolutionary patterns in Fragaria and the genetic basis of fruit domestication of cultivated strawberry at the genomic/transcriptomic level. PMID:28018379

  19. Stronger transferability but lower variability in transcriptomic- than in anonymous microsatellites: evidence from Hylid frogs.

    PubMed

    Dufresnes, Christophe; Brelsford, Alan; Béziers, Paul; Perrin, Nicolas

    2014-07-01

    A simple way to quickly optimize microsatellites in nonmodel organisms is to reuse loci available in closely related taxa; however, this approach can be limited by the stochastic and low cross-amplification success experienced in some groups (e.g. amphibians). An efficient alternative is to develop loci from transcriptome sequences. Transcriptomic microsatellites have been found to vary in their levels of cross-species amplification and variability, but this has to date never been tested in amphibians. Here, we compare the patterns of cross-amplification and levels of polymorphism of 18 published anonymous microsatellites isolated from genomic DNA vs. 17 loci derived from a transcriptome, across nine species of tree frogs (Hyla arborea and Hyla cinerea group). We established a clear negative relationship between divergence time and amplification success, which was much steeper for anonymous than transcriptomic markers, with half-lives (time at which 50% of the markers still amplify) of 1.1 and 37 My, respectively. Transcriptomic markers are significantly less polymorphic than anonymous loci, but remain variable across diverged taxa. We conclude that the exploitation of amphibian transcriptomes for developing microsatellites seems an optimal approach for multispecies surveys (e.g. analyses of hybrid zones, comparative linkage mapping), whereas anonymous microsatellites may be more informative for fine-scale analyses of intraspecific variation. Moreover, our results confirm the pattern that microsatellite cross-amplification is greatly variable among amphibians and should be assessed independently within target lineages. Finally, we provide a bank of microsatellites for Palaearctic tree frogs (so far only available for H. arborea), which will be useful for conservation and evolutionary studies in this radiation. © 2013 John Wiley & Sons Ltd.

  20. Systems metabolic engineering: genome-scale models and beyond.

    PubMed

    Blazeck, John; Alper, Hal

    2010-07-01

    The advent of high throughput genome-scale bioinformatics has led to an exponential increase in available cellular system data. Systems metabolic engineering attempts to use data-driven approaches--based on the data collected with high throughput technologies--to identify gene targets and optimize phenotypical properties on a systems level. Current systems metabolic engineering tools are limited for predicting and defining complex phenotypes such as chemical tolerances and other global, multigenic traits. The most pragmatic systems-based tool for metabolic engineering to arise is the in silico genome-scale metabolic reconstruction. This tool has seen wide adoption for modeling cell growth and predicting beneficial gene knockouts, and we examine here how this approach can be expanded for novel organisms. This review will highlight advances of the systems metabolic engineering approach with a focus on de novo development and use of genome-scale metabolic reconstructions for metabolic engineering applications. We will then discuss the challenges and prospects for this emerging field to enable model-based metabolic engineering. Specifically, we argue that current state-of-the-art systems metabolic engineering techniques represent a viable first step for improving product yield that still must be followed by combinatorial techniques or random strain mutagenesis to achieve optimal cellular systems.

  1. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  2. Genomic understanding of dinoflagellates.

    PubMed

    Lin, Senjie

    2011-01-01

    The phylum of dinoflagellates is characterized by many unusual and interesting genomic and physiological features, the imprint of which, in its immense genome, remains elusive. Much novel understanding has been achieved in the last decade on various aspects of dinoflagellate biology, but most remarkably about the structure, expression pattern and epigenetic modification of protein-coding genes in the nuclear and organellar genomes. Major findings include: 1) the great diversity of dinoflagellates, especially at the base of the dinoflagellate tree of life; 2) mini-circularization of the genomes of typical dinoflagellate plastids (with three membranes, chlorophylls a, c1 and c2, and carotenoid peridinin), the scrambled mitochondrial genome and the extensive mRNA editing occurring in both systems; 3) ubiquitous spliced leader trans-splicing of nuclear-encoded mRNA and demonstrated potential as a novel tool for studying dinoflagellate transcriptomes in mixed cultures and natural assemblages; 4) existence and expression of histones and other nucleosomal proteins; 5) a ribosomal protein set expected of typical eukaryotes; 6) genetic potential of non-photosynthetic solar energy utilization via proton-pump rhodopsin; 7) gene candidates in the toxin synthesis pathways; and 8) evidence of a highly redundant, high gene number and highly recombined genome. Despite this progress, much more work awaits genome-wide transcriptome and whole genome sequencing in order to unfold the molecular mechanisms underlying the numerous mysterious attributes of dinoflagellates. Copyright © 2011 Institut Pasteur. Published by Elsevier SAS. All rights reserved.

  3. From Genes to Milk: Genomic Organization and Epigenetic Regulation of the Mammary Transcriptome

    PubMed Central

    Lemay, Danielle G.; Pollard, Katherine S.; Martin, William F.; Freeman Zadrowski, Courtneay; Hernandez, Joseph; Korf, Ian; German, J. Bruce; Rijnkels, Monique

    2013-01-01

    Even in genomes lacking operons, a gene's position in the genome influences its potential for expression. The mechanisms by which adjacent genes are co-expressed are still not completely understood. Using lactation and the mammary gland as a model system, we explore the hypothesis that chromatin state contributes to the co-regulation of gene neighborhoods. The mammary gland represents a unique evolutionary model, due to its recent appearance, in the context of vertebrate genomes. An understanding of how the mammary gland is regulated to produce milk is also of biomedical and agricultural importance for human lactation and dairying. Here, we integrate epigenomic and transcriptomic data to develop a comprehensive regulatory model. Neighborhoods of mammary-expressed genes were determined using expression data derived from pregnant and lactating mice and a neighborhood scoring tool, G-NEST. Regions of open and closed chromatin were identified by ChIP-Seq of histone modifications H3K36me3, H3K4me2, and H3K27me3 in the mouse mammary gland and liver tissue during lactation. We found that neighborhoods of genes in regions of uniquely active chromatin in the lactating mammary gland, compared with liver tissue, were extremely rare. Rather, genes in most neighborhoods were suppressed during lactation as reflected in their expression levels and their location in regions of silenced chromatin. Chromatin silencing was largely shared between the liver and mammary gland during lactation, and what distinguished the mammary gland was mainly a small tissue-specific repertoire of isolated, expressed genes. These findings suggest that an advantage of the neighborhood organization is in the collective repression of groups of genes via a shared mechanism of chromatin repression. Genes essential to the mammary gland's uniqueness are isolated from neighbors, and likely have less tolerance for variation in expression, properties they share with genes responsible for an organism's survival

  4. Comprehensive reconstruction and in silico analysis of Aspergillus niger genome-scale metabolic network model that accounts for 1210 ORFs.

    PubMed

    Lu, Hongzhong; Cao, Weiqiang; Ouyang, Liming; Xia, Jianye; Huang, Mingzhi; Chu, Ju; Zhuang, Yingping; Zhang, Siliang; Noorman, Henk

    2017-03-01

    Aspergillus niger is one of the most important cell factories for industrial enzymes and organic acids production. A comprehensive genome-scale metabolic network model (GSMM) with high quality is crucial for efficient strain improvement and process optimization. The lack of accurate reaction equations and gene-protein-reaction associations (GPRs) in the current best model of A. niger named GSMM iMA871, however, limits its application scope. To overcome these limitations, we updated the A. niger GSMM by combining the latest genome annotation and literature mining technology. Compared with iMA871, the number of reactions in iHL1210 was increased from 1,380 to 1,764, and the number of unique ORFs from 871 to 1,210. With the aid of our transcriptomics analysis, the existence of 63% ORFs and 68% reactions in iHL1210 can be verified when glucose was used as the only carbon source. Physiological data from chemostat cultivations, 13 C-labeled and molecular experiments from the published literature were further used to check the performance of iHL1210. The average correlation coefficients between the predicted fluxes and estimated fluxes from 13 C-labeling data were sufficiently high (above 0.89) and the prediction of cell growth on most of the reported carbon and nitrogen sources was consistent. Using the updated genome-scale model, we evaluated gene essentiality on synthetic and yeast extract medium, as well as the effects of NADPH supply on glucoamylase production in A. niger. In summary, the new A. niger GSMM iHL1210 contains significant improvements with respect to the metabolic coverage and prediction performance, which paves the way for systematic metabolic engineering of A. niger. Biotechnol. Bioeng. 2017;114: 685-695. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. The Global Invertebrate Genomics Alliance (GIGA): Developing Community Resources to Study Diverse Invertebrate Genomes

    PubMed Central

    2014-01-01

    Over 95% of all metazoan (animal) species comprise the “invertebrates,” but very few genomes from these organisms have been sequenced. We have, therefore, formed a “Global Invertebrate Genomics Alliance” (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture. PMID:24336862

  6. The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes.

    PubMed

    Bracken-Grissom, Heather; Collins, Allen G; Collins, Timothy; Crandall, Keith; Distel, Daniel; Dunn, Casey; Giribet, Gonzalo; Haddock, Steven; Knowlton, Nancy; Martindale, Mark; Medina, Mónica; Messing, Charles; O'Brien, Stephen J; Paulay, Gustav; Putnam, Nicolas; Ravasi, Timothy; Rouse, Greg W; Ryan, Joseph F; Schulze, Anja; Wörheide, Gert; Adamska, Maja; Bailly, Xavier; Breinholt, Jesse; Browne, William E; Diaz, M Christina; Evans, Nathaniel; Flot, Jean-François; Fogarty, Nicole; Johnston, Matthew; Kamel, Bishoy; Kawahara, Akito Y; Laberge, Tammy; Lavrov, Dennis; Michonneau, François; Moroz, Leonid L; Oakley, Todd; Osborne, Karen; Pomponi, Shirley A; Rhodes, Adelaide; Santos, Scott R; Satoh, Nori; Thacker, Robert W; Van de Peer, Yves; Voolstra, Christian R; Welch, David Mark; Winston, Judith; Zhou, Xin

    2014-01-01

    Over 95% of all metazoan (animal) species comprise the "invertebrates," but very few genomes from these organisms have been sequenced. We have, therefore, formed a "Global Invertebrate Genomics Alliance" (GIGA). Our intent is to build a collaborative network of diverse scientists to tackle major challenges (e.g., species selection, sample collection and storage, sequence assembly, annotation, analytical tools) associated with genome/transcriptome sequencing across a large taxonomic spectrum. We aim to promote standards that will facilitate comparative approaches to invertebrate genomics and collaborations across the international scientific community. Candidate study taxa include species from Porifera, Ctenophora, Cnidaria, Placozoa, Mollusca, Arthropoda, Echinodermata, Annelida, Bryozoa, and Platyhelminthes, among others. GIGA will target 7000 noninsect/nonnematode species, with an emphasis on marine taxa because of the unrivaled phyletic diversity in the oceans. Priorities for selecting invertebrates for sequencing will include, but are not restricted to, their phylogenetic placement; relevance to organismal, ecological, and conservation research; and their importance to fisheries and human health. We highlight benefits of sequencing both whole genomes (DNA) and transcriptomes and also suggest policies for genomic-level data access and sharing based on transparency and inclusiveness. The GIGA Web site (http://giga.nova.edu) has been launched to facilitate this collaborative venture.

  7. Automation on the generation of genome-scale metabolic models.

    PubMed

    Reyes, R; Gamermann, D; Montagud, A; Fuente, D; Triana, J; Urchueguía, J F; de Córdoba, P Fernández

    2012-12-01

    Nowadays, the reconstruction of genome-scale metabolic models is a nonautomatized and interactive process based on decision making. This lengthy process usually requires a full year of one person's work in order to satisfactory collect, analyze, and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic, and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. This work presents the automation of a methodology for the reconstruction of genome-scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome-scale metabolic model of a photosynthetic organism, Synechocystis sp. PCC6803. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models, like connectivity and average shortest mean path of the different models, have been compared and analyzed.

  8. Probing the Xenopus laevis inner ear transcriptome for biological function

    PubMed Central

    2012-01-01

    Background The senses of hearing and balance depend upon mechanoreception, a process that originates in the inner ear and shares features across species. Amphibians have been widely used for physiological studies of mechanotransduction by sensory hair cells. In contrast, much less is known of the genetic basis of auditory and vestibular function in this class of animals. Among amphibians, the genus Xenopus is a well-characterized genetic and developmental model that offers unique opportunities for inner ear research because of the amphibian capacity for tissue and organ regeneration. For these reasons, we implemented a functional genomics approach as a means to undertake a large-scale analysis of the Xenopus laevis inner ear transcriptome through microarray analysis. Results Microarray analysis uncovered genes within the X. laevis inner ear transcriptome associated with inner ear function and impairment in other organisms, thereby supporting the inclusion of Xenopus in cross-species genetic studies of the inner ear. The use of gene categories (inner ear tissue; deafness; ion channels; ion transporters; transcription factors) facilitated the assignment of functional significance to probe set identifiers. We enhanced the biological relevance of our microarray data by using a variety of curation approaches to increase the annotation of the Affymetrix GeneChip® Xenopus laevis Genome array. In addition, annotation analysis revealed the prevalence of inner ear transcripts represented by probe set identifiers that lack functional characterization. Conclusions We identified an abundance of targets for genetic analysis of auditory and vestibular function. The orthologues to human genes with known inner ear function and the highly expressed transcripts that lack annotation are particularly interesting candidates for future analyses. We used informatics approaches to impart biologically relevant information to the Xenopus inner ear transcriptome, thereby addressing the

  9. 13C metabolic flux analysis at a genome-scale.

    PubMed

    Gopalakrishnan, Saratram; Maranas, Costas D

    2015-11-01

    Metabolic models used in 13C metabolic flux analysis generally include a limited number of reactions primarily from central metabolism. They typically omit degradation pathways, complete cofactor balances, and atom transition contributions for reactions outside central metabolism. This study addresses the impact on prediction fidelity of scaling-up mapping models to a genome-scale. The core mapping model employed in this study accounts for (75 reactions and 65 metabolites) primarily from central metabolism. The genome-scale metabolic mapping model (GSMM) (697 reaction and 595 metabolites) is constructed using as a basis the iAF1260 model upon eliminating reactions guaranteed not to carry flux based on growth and fermentation data for a minimal glucose growth medium. Labeling data for 17 amino acid fragments obtained from cells fed with glucose labeled at the second carbon was used to obtain fluxes and ranges. Metabolic fluxes and confidence intervals are estimated, for both core and genome-scale mapping models, by minimizing the sum of square of differences between predicted and experimentally measured labeling patterns using the EMU decomposition algorithm. Overall, we find that both topology and estimated values of the metabolic fluxes remain largely consistent between core and GSM model. Stepping up to a genome-scale mapping model leads to wider flux inference ranges for 20 key reactions present in the core model. The glycolysis flux range doubles due to the possibility of active gluconeogenesis, the TCA flux range expanded by 80% due to the availability of a bypass through arginine consistent with labeling data, and the transhydrogenase reaction flux was essentially unresolved due to the presence of as many as five routes for the inter-conversion of NADPH to NADH afforded by the genome-scale model. By globally accounting for ATP demands in the GSMM model the unused ATP decreased drastically with the lower bound matching the maintenance ATP requirement. A non

  10. Transcriptomes Reveal Genetic Signatures Underlying Physiological Variations Imposed by Different Fermentation Conditions in Lactobacillus plantarum

    PubMed Central

    Bongers, Roger S.; van Bokhorst-van de Veen, Hermien; Wiersma, Anne; Overmars, Lex; Marco, Maria L.; Kleerebezem, Michiel

    2012-01-01

    Lactic acid bacteria (LAB) are utilized widely for the fermentation of foods. In the current post-genomic era, tools have been developed that explore genetic diversity among LAB strains aiming to link these variations to differential phenotypes observed in the strains investigated. However, these genotype-phenotype matching approaches fail to assess the role of conserved genes in the determination of physiological characteristics of cultures by environmental conditions. This manuscript describes a complementary approach in which Lactobacillus plantarum WCFS1 was fermented under a variety of conditions that differ in temperature, pH, as well as NaCl, amino acid, and O2 levels. Samples derived from these fermentations were analyzed by full-genome transcriptomics, paralleled by the assessment of physiological characteristics, e.g., maximum growth rate, yield, and organic acid profiles. A data-storage and -mining suite designated FermDB was constructed and exploited to identify correlations between fermentation conditions and industrially relevant physiological characteristics of L. plantarum, as well as the associated transcriptome signatures. Finally, integration of the specific fermentation variables with the transcriptomes enabled the reconstruction of the gene-regulatory networks involved. The fermentation-genomics platform presented here is a valuable complementary approach to earlier described genotype-phenotype matching strategies which allows the identification of transcriptome signatures underlying physiological variations imposed by different fermentation conditions. PMID:22802930

  11. Integrated genomic and transcriptomic analysis of human brain metastases identifies alterations of potential clinical significance.

    PubMed

    Saunus, Jodi M; Quinn, Michael C J; Patch, Ann-Marie; Pearson, John V; Bailey, Peter J; Nones, Katia; McCart Reed, Amy E; Miller, David; Wilson, Peter J; Al-Ejeh, Fares; Mariasegaram, Mythily; Lau, Queenie; Withers, Teresa; Jeffree, Rosalind L; Reid, Lynne E; Da Silva, Leonard; Matsika, Admire; Niland, Colleen M; Cummings, Margaret C; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Anderson, Matthew J; Fink, J Lynn; Holmes, Oliver; Kazakoff, Stephen; Leonard, Conrad; Newell, Felicity; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Kassahn, Karin S; Narayanan, Vairavan; Taib, Nur Aishah; Teo, Soo-Hwang; Chow, Yock Ping; kConFab; Jat, Parmjit S; Brandner, Sebastian; Flanagan, Adrienne M; Khanna, Kum Kum; Chenevix-Trench, Georgia; Grimmond, Sean M; Simpson, Peter T; Waddell, Nicola; Lakhani, Sunil R

    2015-11-01

    Treatment options for patients with brain metastases (BMs) have limited efficacy and the mortality rate is virtually 100%. Targeted therapy is critically under-utilized, and our understanding of mechanisms underpinning metastatic outgrowth in the brain is limited. To address these deficiencies, we investigated the genomic and transcriptomic landscapes of 36 BMs from breast, lung, melanoma and oesophageal cancers, using DNA copy-number analysis and exome- and RNA-sequencing. The key findings were as follows. (a) Identification of novel candidates with possible roles in BM development, including the significantly mutated genes DSC2, ST7, PIK3R1 and SMC5, and the DNA repair, ERBB-HER signalling, axon guidance and protein kinase-A signalling pathways. (b) Mutational signature analysis was applied to successfully identify the primary cancer type for two BMs with unknown origins. (c) Actionable genomic alterations were identified in 31/36 BMs (86%); in one case we retrospectively identified ERBB2 amplification representing apparent HER2 status conversion, then confirmed progressive enrichment for HER2-positivity across four consecutive metastatic deposits by IHC and SISH, resulting in the deployment of HER2-targeted therapy for the patient. (d) In the ERBB/HER pathway, ERBB2 expression correlated with ERBB3 (r(2)  = 0.496; p < 0.0001) and HER3 and HER4 were frequently activated in an independent cohort of 167 archival BM from seven primary cancer types: 57.6% and 52.6% of cases were phospho-HER3(Y1222) or phospho-HER4(Y1162) membrane-positive, respectively. The HER3 ligands NRG1/2 were barely detectable by RNAseq, with NRG1 (8p12) genomic loss in 63.6% breast cancer-BMs, suggesting a microenvironmental source of ligand. In summary, this is the first study to characterize the genomic landscapes of BM. The data revealed novel candidates, potential clinical applications for genomic profiling of resectable BMs, and highlighted the possibility of therapeutically targeting

  12. Genome-wide Annotation, Identification, and Global Transcriptomic Analysis of Regulatory or Small RNA Gene Expression in Staphylococcus aureus

    PubMed Central

    Weiss, Andy; Broach, William H.; Wiemels, Richard E.; Mogen, Austin B.; Rice, Kelly C.

    2016-01-01

    ABSTRACT In Staphylococcus aureus, hundreds of small regulatory or small RNAs (sRNAs) have been identified, yet this class of molecule remains poorly understood and severely understudied. sRNA genes are typically absent from genome annotation files, and as a consequence, their existence is often overlooked, particularly in global transcriptomic studies. To facilitate improved detection and analysis of sRNAs in S. aureus, we generated updated GenBank files for three commonly used S. aureus strains (MRSA252, NCTC 8325, and USA300), in which we added annotations for >260 previously identified sRNAs. These files, the first to include genome-wide annotation of sRNAs in S. aureus, were then used as a foundation to identify novel sRNAs in the community-associated methicillin-resistant strain USA300. This analysis led to the discovery of 39 previously unidentified sRNAs. Investigating the genomic loci of the newly identified sRNAs revealed a surprising degree of inconsistency in genome annotation in S. aureus, which may be hindering the analysis and functional exploration of these elements. Finally, using our newly created annotation files as a reference, we perform a global analysis of sRNA gene expression in S. aureus and demonstrate that the newly identified tsr25 is the most highly upregulated sRNA in human serum. This study provides an invaluable resource to the S. aureus research community in the form of our newly generated annotation files, while at the same time presenting the first examination of differential sRNA expression in pathophysiologically relevant conditions. PMID:26861020

  13. Using deep RNA sequencing for the structural annotation of the laccaria bicolor mycorrhizal transcriptome.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larsen, P. E.; Trivedi, G.; Sreedasyam, A.

    2010-07-06

    Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derivedmore » from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. 69% of expressed mycorrhizal JGI 'best' gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided

  14. JGI Plant Genomics Gene Annotation Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less

  15. Genome-scale CRISPR-Cas9 knockout and transcriptional activation screening.

    PubMed

    Joung, Julia; Konermann, Silvana; Gootenberg, Jonathan S; Abudayyeh, Omar O; Platt, Randall J; Brigham, Mark D; Sanjana, Neville E; Zhang, Feng

    2017-04-01

    Forward genetic screens are powerful tools for the unbiased discovery and functional characterization of specific genetic elements associated with a phenotype of interest. Recently, the RNA-guided endonuclease Cas9 from the microbial CRISPR (clustered regularly interspaced short palindromic repeats) immune system has been adapted for genome-scale screening by combining Cas9 with pooled guide RNA libraries. Here we describe a protocol for genome-scale knockout and transcriptional activation screening using the CRISPR-Cas9 system. Custom- or ready-made guide RNA libraries are constructed and packaged into lentiviral vectors for delivery into cells for screening. As each screen is unique, we provide guidelines for determining screening parameters and maintaining sufficient coverage. To validate candidate genes identified by the screen, we further describe strategies for confirming the screening phenotype, as well as genetic perturbation, through analysis of indel rate and transcriptional activation. Beginning with library design, a genome-scale screen can be completed in 9-15 weeks, followed by 4-5 weeks of validation.

  16. Agricultural biodiversity in the post-genomics era

    USDA-ARS?s Scientific Manuscript database

    The toolkit available for assessing and utilizing biological diversity within agricultural systems is rapidly expanding. In particular, genome and transcriptome re-sequencing as well as genome complexity reduction techniques are gaining popularity as the cost of generating short read sequence data d...

  17. Discovering functions of unannotated genes from a transcriptome survey of wild fungal isolates.

    PubMed

    Ellison, Christopher E; Kowbel, David; Glass, N Louise; Taylor, John W; Brem, Rachel B

    2014-04-01

    Most fungal genomes are poorly annotated, and many fungal traits of industrial and biomedical relevance are not well suited to classical genetic screens. Assigning genes to phenotypes on a genomic scale thus remains an urgent need in the field. We developed an approach to infer gene function from expression profiles of wild fungal isolates, and we applied our strategy to the filamentous fungus Neurospora crassa. Using transcriptome measurements in 70 strains from two well-defined clades of this microbe, we first identified 2,247 cases in which the expression of an unannotated gene rose and fell across N. crassa strains in parallel with the expression of well-characterized genes. We then used image analysis of hyphal morphologies, quantitative growth assays, and expression profiling to test the functions of four genes predicted from our population analyses. The results revealed two factors that influenced regulation of metabolism of nonpreferred carbon and nitrogen sources, a gene that governed hyphal architecture, and a gene that mediated amino acid starvation resistance. These findings validate the power of our population-transcriptomic approach for inference of novel gene function, and we suggest that this strategy will be of broad utility for genome-scale annotation in many fungal systems. IMPORTANCE Some fungal species cause deadly infections in humans or crop plants, and other fungi are workhorses of industrial chemistry, including the production of biofuels. Advances in medical and industrial mycology require an understanding of the genes that control fungal traits. We developed a method to infer functions of uncharacterized genes by observing correlated expression of their mRNAs with those of known genes across wild fungal isolates. We applied this strategy to a filamentous fungus and predicted functions for thousands of unknown genes. In four cases, we experimentally validated the predictions from our method, discovering novel genes involved in the

  18. GIGGLE: a search engine for large-scale integrated genome analysis.

    PubMed

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-02-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

  19. Overcoming Barriers to Progress in Exercise Genomics

    PubMed Central

    Bouchard, Claude

    2011-01-01

    This commentary focuses on the issues of statistical power, the usefulness of hypothesis-free approaches such as in genome-wide association explorations, the necessity of expanding the research beyond common DNA variants, the advantage of combining transcriptomics with genomics, and the complexities inherent to the search for links between genotype and phenotype in exercise genomics research. PMID:21697717

  20. Picking Cell Lines for High-Throughput Transcriptomic Toxicity Screening (SOT)

    EPA Science Inventory

    High throughput, whole genome transcriptomic profiling is a promising approach to comprehensively evaluate chemicals for potential biological effects. To be useful for in vitro toxicity screening, gene expression must be quantified in a set of representative cell types that captu...

  1. AmpuBase: a transcriptome database for eight species of apple snails (Gastropoda: Ampullariidae).

    PubMed

    Ip, Jack C H; Mu, Huawei; Chen, Qian; Sun, Jin; Ituarte, Santiago; Heras, Horacio; Van Bocxlaer, Bert; Ganmanee, Monthon; Huang, Xin; Qiu, Jian-Wen

    2018-03-05

    Gastropoda, with approximately 80,000 living species, is the largest class of Mollusca. Among gastropods, apple snails (family Ampullariidae) are globally distributed in tropical and subtropical freshwater ecosystems and many species are ecologically and economically important. Ampullariids exhibit various morphological and physiological adaptations to their respective habitats, which make them ideal candidates for studying adaptation, population divergence, speciation, and larger-scale patterns of diversity, including the biogeography of native and invasive populations. The limited availability of genomic data, however, hinders in-depth ecological and evolutionary studies of these non-model organisms. Using Illumina Hiseq platforms, we sequenced 1220 million reads for seven species of apple snails. Together with the previously published RNA-Seq data of two apple snails, we conducted de novo transcriptome assembly of eight species that belong to five genera of Ampullariidae, two of which represent Old World lineages and the other three New World lineages. There were 20,730 to 35,828 unigenes with predicted open reading frames for the eight species, with N50 (shortest sequence length at 50% of the unigenes) ranging from 1320 to 1803 bp. 69.7% to 80.2% of these unigenes were functionally annotated by searching against NCBI's non-redundant, Gene Ontology database and the Kyoto Encyclopaedia of Genes and Genomes. With these data we developed AmpuBase, a relational database that features online BLAST functionality for DNA/protein sequences, keyword searching for unigenes/functional terms, and download functions for sequences and whole transcriptomes. In summary, we have generated comprehensive transcriptome data for multiple ampullariid genera and species, and created a publicly accessible database with a user-friendly interface to facilitate future basic and applied studies on ampullariids, and comparative molecular studies with other invertebrates.

  2. Combined Large-Scale Phenotyping and Transcriptomics in Maize Reveals a Robust Growth Regulatory Network.

    PubMed

    Baute, Joke; Herman, Dorota; Coppens, Frederik; De Block, Jolien; Slabbinck, Bram; Dell'Acqua, Matteo; Pè, Mario Enrico; Maere, Steven; Nelissen, Hilde; Inzé, Dirk

    2016-03-01

    Leaves are vital organs for biomass and seed production because of their role in the generation of metabolic energy and organic compounds. A better understanding of the molecular networks underlying leaf development is crucial to sustain global requirements for food and renewable energy. Here, we combined transcriptome profiling of proliferative leaf tissue with in-depth phenotyping of the fourth leaf at later stages of development in 197 recombinant inbred lines of two different maize (Zea mays) populations. Previously, correlation analysis in a classical biparental mapping population identified 1,740 genes correlated with at least one of 14 traits. Here, we extended these results with data from a multiparent advanced generation intercross population. As expected, the phenotypic variability was found to be larger in the latter population than in the biparental population, although general conclusions on the correlations among the traits are comparable. Data integration from the two diverse populations allowed us to identify a set of 226 genes that are robustly associated with diverse leaf traits. This set of genes is enriched for transcriptional regulators and genes involved in protein synthesis and cell wall metabolism. In order to investigate the molecular network context of the candidate gene set, we integrated our data with publicly available functional genomics data and identified a growth regulatory network of 185 genes. Our results illustrate the power of combining in-depth phenotyping with transcriptomics in mapping populations to dissect the genetic control of complex traits and present a set of candidate genes for use in biomass improvement. © 2016 American Society of Plant Biologists. All Rights Reserved.

  3. An Alternative Strategy for Trypanosome Survival in the Mammalian Bloodstream Revealed through Genome and Transcriptome Analysis of the Ubiquitous Bovine Parasite Trypanosoma (Megatrypanum) theileri

    PubMed Central

    Kelly, Steven; Ivens, Alasdair; Mott, G. Adam; O’Neill, Ellis; Emms, David; Macleod, Olivia; Voorheis, Paul; Tyler, Kevin; Clark, Matthew; Matthews, Jacqueline

    2017-01-01

    Abstract There are hundreds of Trypanosoma species that live in the blood and tissue spaces of their vertebrate hosts. The vast majority of these do not have the ornate system of antigenic variation that has evolved in the small number of African trypanosome species, but can still maintain long-term infections in the face of the vertebrate adaptive immune system. Trypanosoma theileri is a typical example, has a restricted host range of cattle and other Bovinae, and is only occasionally reported to cause patent disease although no systematic survey of the effect of infection on agricultural productivity has been performed. Here, a detailed genome sequence and a transcriptome analysis of gene expression in bloodstream form T. theileri have been performed. Analysis of the genome sequence and expression showed that T. theileri has a typical kinetoplastid genome structure and allowed a prediction that it is capable of meiotic exchange, gene silencing via RNA interference and, potentially, density-dependent growth control. In particular, the transcriptome analysis has allowed a comparison of two distinct trypanosome cell surfaces, T. brucei and T. theileri, that have each evolved to enable the maintenance of a long-term extracellular infection in cattle. The T. theileri cell surface can be modeled to contain a mixture of proteins encoded by four novel large and divergent gene families and by members of a major surface protease gene family. This surface composition is distinct from the uniform variant surface glycoprotein coat on African trypanosomes providing an insight into a second mechanism used by trypanosome species that proliferate in an extracellular milieu in vertebrate hosts to avoid the adaptive immune response. PMID:28903536

  4. Genome resequencing and transcriptome profiling reveal structural diversity and expression patterns of constitutive disease resistance genes in Huanglongbing-tolerant Poncirus trifoliata and its hybrids

    PubMed Central

    Rawat, Nidhi; Kumar, Brajendra; Albrecht, Ute; Du, Dongliang; Huang, Ming; Yu, Qibin; Zhang, Yi; Duan, Yong-Ping; Bowman, Kim D; Gmitter, Fred G; Deng, Zhanao

    2017-01-01

    Huanglongbing (HLB) is the most destructive bacterial disease of citrus worldwide. While most citrus varieties are susceptible to HLB, Poncirus trifoliata, a close relative of Citrus, and some of its hybrids with Citrus are tolerant to HLB. No specific HLB tolerance genes have been identified in P. trifoliata but recent studies have shown that constitutive disease resistance (CDR) genes were expressed at much higher levels in HLB-tolerant Poncirus hybrids and the expression of CDR genes was modulated by Candidatus Liberibacter asiaticus (CLas), the pathogen of HLB. The current study was undertaken to mine and characterize the CDR gene family in Citrus and Poncirus and to understand its association with HLB tolerance in Poncirus. We identified 17 CDR genes in two citrus genomes, deduced their structures, and investigated their phylogenetic relationships. We revealed that the expansion of the CDR family in Citrus seems to be due to segmental and tandem duplication events. Through genome resequencing and transcriptome sequencing, we identified eight CDR genes in the Poncirus genome (PtCDR1-PtCDR8). The number of SNPs was the highest in PtCDR2 and the lowest in PtCDR7. Most of the deletion and insertion events were observed in the UTR regions of Citrus and Poncirus CDR genes. PtCDR2 and PtCDR8 were in abundance in the leaf transcriptomes of two HLB-tolerant Poncirus genotypes and were also upregulated in HLB-tolerant, Poncirus hybrids as revealed by real-time PCR analysis. These two CDR genes seem to be good candidate genes for future studies of their role in citrus-CLas interactions. PMID:29152310

  5. A genome-wide longitudinal transcriptome analysis of the aging model Podospora anserina.

    PubMed

    Philipp, Oliver; Hamann, Andrea; Servos, Jörg; Werner, Alexandra; Koch, Ina; Osiewacz, Heinz D

    2013-01-01

    Aging of biological systems is controlled by various processes which have a potential impact on gene expression. Here we report a genome-wide transcriptome analysis of the fungal aging model Podospora anserina. Total RNA of three individuals of defined age were pooled and analyzed by SuperSAGE (serial analysis of gene expression). A bioinformatics analysis identified different molecular pathways to be affected during aging. While the abundance of transcripts linked to ribosomes and to the proteasome quality control system were found to decrease during aging, those associated with autophagy increase, suggesting that autophagy may act as a compensatory quality control pathway. Transcript profiles associated with the energy metabolism including mitochondrial functions were identified to fluctuate during aging. Comparison of wild-type transcripts, which are continuously down-regulated during aging, with those down-regulated in the long-lived, copper-uptake mutant grisea, validated the relevance of age-related changes in cellular copper metabolism. Overall, we (i) present a unique age-related data set of a longitudinal study of the experimental aging model P. anserina which represents a reference resource for future investigations in a variety of organisms, (ii) suggest autophagy to be a key quality control pathway that becomes active once other pathways fail, and (iii) present testable predictions for subsequent experimental investigations.

  6. Genomic basis for the convergent evolution of electric organs

    PubMed Central

    Gallant, Jason R.; Traeger, Lindsay L.; Volkening, Jeremy D.; Moffett, Howell; Chen, Po-Hao; Novina, Carl D.; Phillips, George N.; Anand, Rene; Wells, Gregg B.; Pinch, Matthew; Güth, Robert; Unguez, Graciela A.; Albert, James S.; Zakon, Harold H.; Samanta, Manoj P.; Sussman, Michael R.

    2017-01-01

    Little is known about the genetic basis of convergent traits that originate repeatedly over broad taxonomic scales. The myogenic electric organ has evolved six times in fishes to produce electric fields used in communication, navigation, predation, or defense. We have examined the genomic basis of the convergent anatomical and physiological origins of these organs by assembling the genome of the electric eel (Electrophorus electricus) and sequencing electric organ and skeletal muscle transcriptomes from three lineages that have independently evolved electric organs. Our results indicate that, despite millions of years of evolution and large differences in the morphology of electric organ cells, independent lineages have leveraged similar transcription factors and developmental and cellular pathways in the evolution of electric organs. PMID:24970089

  7. Evolution of the unspliced transcriptome.

    PubMed

    Engelhardt, Jan; Stadler, Peter F

    2015-08-20

    Despite their abundance, unspliced EST data have received little attention as a source of information on non-coding RNAs. Very little is know, therefore, about the genomic distribution of unspliced non-coding transcripts and their relationship with the much better studied regularly spliced products. In particular, their evolution has remained virtually unstudied. We systematically study the evidence on unspliced transcripts available in EST annotation tracks for human and mouse, comprising 104,980 and 66,109 unspliced EST clusters, respectively. Roughly one third of these are located totally inside introns of known genes (TINs) and another third overlaps exonic regions (PINs). Eleven percent are "intergenic", far away from any annotated gene. Direct evidence for the independent transcription of many PINs and TINs is obtained from CAGE tag and chromatin data. We predict more than 2000 3'UTR-associated RNA candidates for each human and mouse. Fifteen to twenty percent of the unspliced EST cluster are conserved between human and mouse. With the exception of TINs, the sequences of unspliced EST clusters evolve significantly slower than genomic background. Furthermore, like spliced lincRNAs, they show highly tissue-specific expression patterns. Unspliced long non-coding RNAs are an important, rapidly evolving, component of mammalian transcriptomes. Their analysis is complicated by their preferential association with complex transcribed loci that usually also harbor a plethora of spliced transcripts. Unspliced EST data, although typically disregarded in transcriptome analysis, can be used to gain insights into this rarely investigated transcriptome component. The frequently postulated connection between lack of splicing and nuclear retention and the surprising overlap of chromatin-associated transcripts suggests that this class of transcripts might be involved in chromatin organization and possibly other mechanisms of epigenetic control.

  8. Genome and transcriptome adaptation accompanying emergence of the definitive type 2 host-restricted Salmonella enterica serovar Typhimurium pathovar.

    PubMed

    Kingsley, Robert A; Kay, Sally; Connor, Thomas; Barquist, Lars; Sait, Leanne; Holt, Kathryn E; Sivaraman, Karthi; Wileman, Thomas; Goulding, David; Clare, Simon; Hale, Christine; Seshasayee, Aswin; Harris, Simon; Thomson, Nicholas R; Gardner, Paul; Rabsch, Wolfgang; Wigley, Paul; Humphrey, Tom; Parkhill, Julian; Dougan, Gordon

    2013-08-27

    Salmonella enterica serovar Typhimurium definitive type 2 (DT2) is host restricted to Columba livia (rock or feral pigeon) but is also closely related to S. Typhimurium isolates that circulate in livestock and cause a zoonosis characterized by gastroenteritis in humans. DT2 isolates formed a distinct phylogenetic cluster within S. Typhimurium based on whole-genome-sequence polymorphisms. Comparative genome analysis of DT2 94-213 and S. Typhimurium SL1344, DT104, and D23580 identified few differences in gene content with the exception of variations within prophages. However, DT2 94-213 harbored 22 pseudogenes that were intact in other closely related S. Typhimurium strains. We report a novel in silico approach to identify single amino acid substitutions in proteins that have a high probability of a functional impact. One polymorphism identified using this method, a single-residue deletion in the Tar protein, abrogated chemotaxis to aspartate in vitro. DT2 94-213 also exhibited an altered transcriptional profile in response to culture at 42°C compared to that of SL1344. Such differentially regulated genes included a number involved in flagellum biosynthesis and motility. IMPORTANCE Whereas Salmonella enterica serovar Typhimurium can infect a wide range of animal species, some variants within this serovar exhibit a more limited host range and altered disease potential. Phylogenetic analysis based on whole-genome sequences can identify lineages associated with specific virulence traits, including host adaptation. This study represents one of the first to link pathogen-specific genetic signatures, including coding capacity, genome degradation, and transcriptional responses to host adaptation within a Salmonella serovar. We performed comparative genome analysis of reference and pigeon-adapted definitive type 2 (DT2) S. Typhimurium isolates alongside phenotypic and transcriptome analyses, to identify genetic signatures linked to host adaptation within the DT2 lineage.

  9. MEMOSys: Bioinformatics platform for genome-scale metabolic models

    PubMed Central

    2011-01-01

    Background Recent advances in genomic sequencing have enabled the use of genome sequencing in standard biological and biotechnological research projects. The challenge is how to integrate the large amount of data in order to gain novel biological insights. One way to leverage sequence data is to use genome-scale metabolic models. We have therefore designed and implemented a bioinformatics platform which supports the development of such metabolic models. Results MEMOSys (MEtabolic MOdel research and development System) is a versatile platform for the management, storage, and development of genome-scale metabolic models. It supports the development of new models by providing a built-in version control system which offers access to the complete developmental history. Moreover, the integrated web board, the authorization system, and the definition of user roles allow collaborations across departments and institutions. Research on existing models is facilitated by a search system, references to external databases, and a feature-rich comparison mechanism. MEMOSys provides customizable data exchange mechanisms using the SBML format to enable analysis in external tools. The web application is based on the Java EE framework and offers an intuitive user interface. It currently contains six annotated microbial metabolic models. Conclusions We have developed a web-based system designed to provide researchers a novel application facilitating the management and development of metabolic models. The system is freely available at http://www.icbi.at/MEMOSys. PMID:21276275

  10. iMETHYL: an integrative database of human DNA methylation, gene expression, and genomic variation.

    PubMed

    Komaki, Shohei; Shiwa, Yuh; Furukawa, Ryohei; Hachiya, Tsuyoshi; Ohmomo, Hideki; Otomo, Ryo; Satoh, Mamoru; Hitomi, Jiro; Sobue, Kenji; Sasaki, Makoto; Shimizu, Atsushi

    2018-01-01

    We launched an integrative multi-omics database, iMETHYL (http://imethyl.iwate-megabank.org). iMETHYL provides whole-DNA methylation (~24 million autosomal CpG sites), whole-genome (~9 million single-nucleotide variants), and whole-transcriptome (>14 000 genes) data for CD4 + T-lymphocytes, monocytes, and neutrophils collected from approximately 100 subjects. These data were obtained from whole-genome bisulfite sequencing, whole-genome sequencing, and whole-transcriptome sequencing, making iMETHYL a comprehensive database.

  11. De novo assembly and transcriptomic profiling of the grazing response in Stipa grandis.

    PubMed

    Wan, Dongli; Wan, Yongqing; Hou, Xiangyang; Ren, Weibo; Ding, Yong; Sa, Rula

    2015-01-01

    Stipa grandis (Poaceae) is one of the dominant species in a typical steppe of the Inner Mongolian Plateau. However, primarily due to heavy grazing, the grasslands have become seriously degraded, and S. grandis has developed a special growth-inhibition phenotype against the stressful habitat. Because of the lack of transcriptomic and genomic information, the understanding of the molecular mechanisms underlying the grazing response of S. grandis has been prohibited. Using the Illumina HiSeq 2000 platform, two libraries prepared from non-grazing (FS) and overgrazing samples (OS) were sequenced. De novo assembly produced 94,674 unigenes, of which 65,047 unigenes had BLAST hits in the National Center for Biotechnology Information (NCBI) non-redundant (nr) database (E-value < 10-5). In total, 47,747, 26,156 and 40,842 unigenes were assigned to the Gene Ontology (GO), Clusters of Orthologous Group (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. A total of 13,221 unigenes showed significant differences in expression under the overgrazing condition, with a threshold false discovery rate ≤ 0.001 and an absolute value of log2Ratio ≥ 1. These differentially expressed genes (DEGs) were assigned to 43,257 GO terms and were significantly enriched in 32 KEGG pathways (q-value ≤ 0.05). The alterations in the wound-, drought- and defense-related genes indicate that stressors have an additive effect on the growth inhibition of this species. This first large-scale transcriptome study will provide important information for further gene expression and functional genomics studies, and it facilitated our investigation of the molecular mechanisms of the S. grandis grazing response and the associated morphological and physiological characteristics.

  12. GIGGLE: a search engine for large-scale integrated genome analysis

    PubMed Central

    Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R

    2018-01-01

    GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061

  13. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    PubMed

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Genome-Scale Model and Omics Analysis of Metabolic Capacities of Akkermansia muciniphila Reveal a Preferential Mucin-Degrading Lifestyle

    PubMed Central

    Suarez-Diez, Maria; Boeren, Sjef; Schaap, Peter J.; Martins dos Santos, Vitor A. P.; Smidt, Hauke; Belzer, Clara

    2017-01-01

    ABSTRACT The composition and activity of the microbiota in the human gastrointestinal tract are primarily shaped by nutrients derived from either food or the host. Bacteria colonizing the mucus layer have evolved to use mucin as a carbon and energy source. One of the members of the mucosa-associated microbiota is Akkermansia muciniphila, which is capable of producing an extensive repertoire of mucin-degrading enzymes. To further study the substrate utilization abilities of A. muciniphila, we constructed a genome-scale metabolic model to test amino acid auxotrophy, vitamin biosynthesis, and sugar-degrading capacities. The model-supported predictions were validated by in vitro experiments, which showed A. muciniphila to be able to utilize the mucin-derived monosaccharides fucose, galactose, and N-acetylglucosamine. Growth was also observed on N-acetylgalactosamine, even though the metabolic model did not predict this. The uptake of these sugars, as well as the nonmucin sugar glucose, was enhanced in the presence of mucin, indicating that additional mucin-derived components are needed for optimal growth. An analysis of whole-transcriptome sequencing (RNA-Seq) comparing the gene expression of A. muciniphila grown on mucin with that of the same bacterium grown on glucose confirmed the activity of the genes involved in mucin degradation and revealed most of these to be upregulated in the presence of mucin. The transcriptional response was confirmed by a proteome analysis, altogether revealing a hierarchy in the use of sugars and reflecting the adaptation of A. muciniphila to the mucosal environment. In conclusion, these findings provide molecular insights into the lifestyle of A. muciniphila and further confirm its role as a mucin specialist in the gut. IMPORTANCE Akkermansia muciniphila is among the most abundant mucosal bacteria in humans and in a wide range of other animals. Recently, A. muciniphila has attracted considerable attention because of its capacity to

  15. Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii): The Identification of Genes and Markers Associated with Reproduction.

    PubMed

    Jung, Hyungtaek; Yoon, Byung-Ha; Kim, Woo-Jin; Kim, Dong-Wook; Hurwood, David A; Lyons, Russell E; Salin, Krishna R; Kim, Heui-Soo; Baek, Ilseon; Chand, Vincent; Mather, Peter B

    2016-05-07

    The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world's most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium.

  16. Optimizing Hybrid de Novo Transcriptome Assembly and Extending Genomic Resources for Giant Freshwater Prawns (Macrobrachium rosenbergii): The Identification of Genes and Markers Associated with Reproduction

    PubMed Central

    Jung, Hyungtaek; Yoon, Byung-Ha; Kim, Woo-Jin; Kim, Dong-Wook; Hurwood, David A.; Lyons, Russell E.; Salin, Krishna R.; Kim, Heui-Soo; Baek, Ilseon; Chand, Vincent; Mather, Peter B.

    2016-01-01

    The giant freshwater prawn, Macrobrachium rosenbergii, a sexually dimorphic decapod crustacean is currently the world’s most economically important cultured freshwater crustacean species. Despite its economic importance, there is currently a lack of genomic resources available for this species, and this has limited exploration of the molecular mechanisms that control the M. rosenbergii sex-differentiation system more widely in freshwater prawns. Here, we present the first hybrid transcriptome from M. rosenbergii applying RNA-Seq technologies directed at identifying genes that have potential functional roles in reproductive-related traits. A total of 13,733,210 combined raw reads (1720 Mbp) were obtained from Ion-Torrent PGM and 454 FLX. Bioinformatic analyses based on three state-of-the-art assemblers, the CLC Genomic Workbench, Trans-ABySS, and Trinity, that use single and multiple k-mer methods respectively, were used to analyse the data. The influence of multiple k-mers on assembly performance was assessed to gain insight into transcriptome assembly from short reads. After optimisation, de novo assembly resulted in 44,407 contigs with a mean length of 437 bp, and the assembled transcripts were further functionally annotated to detect single nucleotide polymorphisms and simple sequence repeat motifs. Gene expression analysis was also used to compare expression patterns from ovary and testis tissue libraries to identify genes with potential roles in reproduction and sex differentiation. The large transcript set assembled here represents the most comprehensive set of transcriptomic resources ever developed for reproduction traits in M. rosenbergii, and the large number of genetic markers predicted should constitute an invaluable resource for future genetic research studies on M. rosenbergii and can be applied more widely on other freshwater prawn species in the genus Macrobrachium. PMID:27164098

  17. GST-PRIME: an algorithm for genome-wide primer design.

    PubMed

    Leister, Dario; Varotto, Claudio

    2007-01-01

    The profiling of mRNA expression based on DNA arrays has become a powerful tool to study genome-wide transcription of genes in a number of organisms. GST-PRIME is a software package created to facilitate large-scale primer design for the amplification of probes to be immobilized on arrays for transcriptome analyses, even though it can be also applied in low-throughput approaches. GST-PRIME allows highly efficient, direct amplification of gene-sequence tags (GSTs) from genomic DNA (gDNA), starting from annotated genome or transcript sequences. GST-PRIME provides a customer-friendly platform for automatic primer design, and despite the relative simplicity of the algorithm, experimental tests in the model plant species Arabidopsis thaliana confirmed the reliability of the software. This chapter describes the algorithm used for primer design, its input and output files, and the installation of the standalone package and its use.

  18. Analysis of embryonic development in the unsequenced axolotl: waves of transcriptomic upheaval and stability

    PubMed Central

    Jiang, Peng; Nelson, Jeffrey D.; Leng, Ning; Collins, Michael; Swanson, Scott; Dewey, Colin N.; Thomson, James A.; Stewart, Ron

    2016-01-01

    The axolotl (Ambystoma mexicanum) has long been the subject of biological research, primarily owing to its outstanding regenerative capabilities. However, the gene expression programs governing its embryonic development are particularly underexplored, especially when compared to other amphibian model species. Therefore, we performed whole transcriptome polyA+ RNA sequencing experiments on 17 stages of embryonic development. As the axolotl genome is unsequenced and its gene annotation is incomplete, we built de novo transcriptome assemblies for each stage and garnered functional annotation by comparing expressed contigs with known genes in other organisms. In evaluating the number of differentially expressed genes over time, we identify three waves of substantial transcriptome upheaval each followed by a period of relative transcriptome stability. The first wave of upheaval is between the one and two cell stage. We show that the number of differentially expressed genes per unit time is higher between the one and two cell stage than it is across the mid-blastula transition (MBT), the period of zygotic genome activation. We use total RNA sequencing to demonstrate that the vast majority of genes with increasing polyA+ signal between the one and two cell stage result from polyadenylation rather than de novo transcription. The first stable phase begins after the two cell stage and continues until the mid-blastula transition, corresponding with the pre-MBT phase of transcriptional quiescence in amphibian development. Following this is a peak of differential gene expression corresponding with the activation of the zygotic genome and a phase of transcriptomic stability from stages 9 to 11. We observe a third wave of transcriptomic change between stages 11 and 14, followed by a final stable period. The last two stable phases have not been documented in amphibians previously and correspond to times of major morphogenic change in the axolotl embryo: gastrulation and

  19. Reciprocal genomic evolution in the ant–fungus agricultural symbiosis

    PubMed Central

    Nygaard, Sanne; Hu, Haofu; Li, Cai; Schiøtt, Morten; Chen, Zhensheng; Yang, Zhikai; Xie, Qiaolin; Ma, Chunyu; Deng, Yuan; Dikow, Rebecca B.; Rabeling, Christian; Nash, David R.; Wcislo, William T.; Brady, Seán G.; Schultz, Ted R.; Zhang, Guojie; Boomsma, Jacobus J.

    2016-01-01

    The attine ant–fungus agricultural symbiosis evolved over tens of millions of years, producing complex societies with industrial-scale farming analogous to that of humans. Here we document reciprocal shifts in the genomes and transcriptomes of seven fungus-farming ant species and their fungal cultivars. We show that ant subsistence farming probably originated in the early Tertiary (55–60 MYA), followed by further transitions to the farming of fully domesticated cultivars and leaf-cutting, both arising earlier than previously estimated. Evolutionary modifications in the ants include unprecedented rates of genome-wide structural rearrangement, early loss of arginine biosynthesis and positive selection on chitinase pathways. Modifications of fungal cultivars include loss of a key ligninase domain, changes in chitin synthesis and a reduction in carbohydrate-degrading enzymes as the ants gradually transitioned to functional herbivory. In contrast to human farming, increasing dependence on a single cultivar lineage appears to have been essential to the origin of industrial-scale ant agriculture. PMID:27436133

  20. The Transcriptome of the Reference Potato Genome Solanum tuberosum Group Phureja Clone DM1-3 516R44

    PubMed Central

    Massa, Alicia N.; Childs, Kevin L.; Lin, Haining; Bryan, Glenn J.; Giuliano, Giovanni; Buell, C. Robin

    2011-01-01

    Advances in molecular breeding in potato have been limited by its complex biological system, which includes vegetative propagation, autotetraploidy, and extreme heterozygosity. The availability of the potato genome and accompanying gene complement with corresponding gene structure, location, and functional annotation are powerful resources for understanding this complex plant and advancing molecular breeding efforts. Here, we report a reference for the potato transcriptome using 32 tissues and growth conditions from the doubled monoploid Solanum tuberosum Group Phureja clone DM1-3 516R44 for which a genome sequence is available. Analysis of greater than 550 million RNA-Seq reads permitted the detection and quantification of expression levels of over 22,000 genes. Hierarchical clustering and principal component analyses captured the biological variability that accounts for gene expression differences among tissues suggesting tissue-specific gene expression, and genes with tissue or condition restricted expression. Using gene co-expression network analysis, we identified 18 gene modules that represent tissue-specific transcriptional networks of major potato organs and developmental stages. This information provides a powerful resource for potato research as well as studies on other members of the Solanaceae family. PMID:22046362

  1. Comparative Transcriptomics to Identify Novel Genes and Pathways in Dinoflagellates

    NASA Astrophysics Data System (ADS)

    Ryan, D.

    2016-02-01

    The unarmored dinoflagellate Karenia brevis is among the most prominent harmful, bloom-forming phytoplankton species in the Gulf of Mexico. During blooms, the polyketides PbTx-1 and PbTx-2 (brevetoxins) are produced by K. brevis. Brevetoxins negatively impact human health and the Gulf shellfish harvest. However, the genes underlying brevetoxin synthesis are currently unknown. Because the K. brevis genome is extremely large ( 1 × 1011 base pairs long), and with a high proportion of repetitive, non-coding DNA, it has not been sequenced. In fact, large, repetitive genomes are common among the dinoflagellate group. High-throughput RNA sequencing technology enabled us to assemble Karenia transcriptomes de novo and investigate potential genes in the brevetoxin pathway through comparative transcriptomics. The brevetoxin profile varies among K. brevis clonal cultures. For example, well-documented Wilson-CCFWC268 typically produces 8-10 pg PbTx per cell, whereas SP1 produces < 2 pg PbTx/cell, and the mutant low-toxin Wilson clone produces undetectable to low (<0.05 pg/cell) amounts. Further, PbTx-2 has been measured in Karenia papilionacea but not Karenia mikimotoi. We compared the transcriptomes of four K. brevis clones (Wilson-CCFWC268, SP3, SP1, and mutant low-toxin Wilson) with K. papilionacea and K. mikimotoi to investigate nucleotide-level genetic variations and differences in gene expression. Of the 85,000 transcripts in the K. brevis transcriptome, 4,600 transcripts, including novel unannotated orthologs and putative polyketide synthases (PKSs), were only expressed by brevetoxin-producing K. brevis and K. papilionacea, not K. mikimotoi. Examination of gene expression between the typical- and low-toxin Wilson clones identified about 3,500 genes with significantly different expression levels, including 2 putative PKSs. One of the 2 PKSs was only found in the brevetoxin-producing Karenia species. These transcriptomes could not have been characterized without high

  2. Conservation genetics and genomics of amphibians and reptiles.

    PubMed

    Shaffer, H Bradley; Gidiş, Müge; McCartney-Melstad, Evan; Neal, Kevin M; Oyamaguchi, Hilton M; Tellez, Marisa; Toffelmier, Erin M

    2015-01-01

    Amphibians and reptiles as a group are often secretive, reach their greatest diversity often in remote tropical regions, and contain some of the most endangered groups of organisms on earth. Particularly in the past decade, genetics and genomics have been instrumental in the conservation biology of these cryptic vertebrates, enabling work ranging from the identification of populations subject to trade and exploitation, to the identification of cryptic lineages harboring critical genetic variation, to the analysis of genes controlling key life history traits. In this review, we highlight some of the most important ways that genetic analyses have brought new insights to the conservation of amphibians and reptiles. Although genomics has only recently emerged as part of this conservation tool kit, several large-scale data sources, including full genomes, expressed sequence tags, and transcriptomes, are providing new opportunities to identify key genes, quantify landscape effects, and manage captive breeding stocks of at-risk species.

  3. Pichia stipitis genomics, transcriptomics, and gene clusters

    Treesearch

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  4. Deep insight into the Ganoderma lucidum by comprehensive analysis of its transcriptome.

    PubMed

    Yu, Guo-Jun; Wang, Man; Huang, Jie; Yin, Ya-Lin; Chen, Yi-Jie; Jiang, Shuai; Jin, Yan-Xia; Lan, Xian-Qing; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2012-01-01

    Ganoderma lucidum is a basidiomycete white rot fungus and is of medicinal importance in China, Japan and other countries in the Asiatic region. To date, much research has been performed in identifying the medicinal ingredients in Ganoderma lucidum. Despite its important therapeutic effects in disease, little is known about Ganoderma lucidum at the genomic level. In order to gain a molecular understanding of this fungus, we utilized Illumina high-throughput technology to sequence and analyze the transcriptome of Ganoderma lucidum. We obtained 6,439,690 and 6,416,670 high-quality reads from the mycelium and fruiting body of Ganoderma lucidum, and these were assembled to form 18,892 and 27,408 unigenes, respectively. A similarity search was performed against the NCBI non-redundant nucleotide database and a customized database composed of five fungal genomes. 11,098 and 8, 775 unigenes were matched to the NCBI non-redundant nucleotide database and our customized database, respectively. All unigenes were subjected to annotation by Gene Ontology, Eukaryotic Orthologous Group terms and Kyoto Encyclopedia of Genes and Genomes. Differentially expressed genes from the Ganoderma lucidum mycelium and fruiting body stage were analyzed, resulting in the identification of 13 unigenes which are involved in the terpenoid backbone biosynthesis pathway. Quantitative real-time PCR was used to confirm the expression levels of these unigenes. Ganoderma lucidum was also studied for wood degrading activity and a total of 22 putative FOLymes (fungal oxidative lignin enzymes) and 120 CAZymes (carbohydrate-active enzymes) were predicted from our Ganoderma lucidum transcriptome. Our study provides comprehensive gene expression information on Ganoderma lucidum at the transcriptional level, which will form the foundation for functional genomics studies in this fungus. The use of Illumina sequencing technology has made de novo transcriptome assembly and gene expression analysis possible in

  5. Deep Insight into the Ganoderma lucidum by Comprehensive Analysis of Its Transcriptome

    PubMed Central

    Yu, Guo-Jun; Wang, Man; Huang, Jie; Yin, Ya-Lin; Chen, Yi-Jie; Jiang, Shuai; Jin, Yan-Xia; Lan, Xian-Qing; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2012-01-01

    Background Ganoderma lucidum is a basidiomycete white rot fungus and is of medicinal importance in China, Japan and other countries in the Asiatic region. To date, much research has been performed in identifying the medicinal ingredients in Ganoderma lucidum. Despite its important therapeutic effects in disease, little is known about Ganoderma lucidum at the genomic level. In order to gain a molecular understanding of this fungus, we utilized Illumina high-throughput technology to sequence and analyze the transcriptome of Ganoderma lucidum. Methodology/Principal Findings We obtained 6,439,690 and 6,416,670 high-quality reads from the mycelium and fruiting body of Ganoderma lucidum, and these were assembled to form 18,892 and 27,408 unigenes, respectively. A similarity search was performed against the NCBI non-redundant nucleotide database and a customized database composed of five fungal genomes. 11,098 and 8, 775 unigenes were matched to the NCBI non-redundant nucleotide database and our customized database, respectively. All unigenes were subjected to annotation by Gene Ontology, Eukaryotic Orthologous Group terms and Kyoto Encyclopedia of Genes and Genomes. Differentially expressed genes from the Ganoderma lucidum mycelium and fruiting body stage were analyzed, resulting in the identification of 13 unigenes which are involved in the terpenoid backbone biosynthesis pathway. Quantitative real-time PCR was used to confirm the expression levels of these unigenes. Ganoderma lucidum was also studied for wood degrading activity and a total of 22 putative FOLymes (fungal oxidative lignin enzymes) and 120 CAZymes (carbohydrate-active enzymes) were predicted from our Ganoderma lucidum transcriptome. Conclusions Our study provides comprehensive gene expression information on Ganoderma lucidum at the transcriptional level, which will form the foundation for functional genomics studies in this fungus. The use of Illumina sequencing technology has made de novo transcriptome

  6. Genome-scale engineering of Saccharomyces cerevisiae with single-nucleotide precision.

    PubMed

    Bao, Zehua; HamediRad, Mohammad; Xue, Pu; Xiao, Han; Tasan, Ipek; Chao, Ran; Liang, Jing; Zhao, Huimin

    2018-07-01

    We developed a CRISPR-Cas9- and homology-directed-repair-assisted genome-scale engineering method named CHAnGE that can rapidly output tens of thousands of specific genetic variants in yeast. More than 98% of target sequences were efficiently edited with an average frequency of 82%. We validate the single-nucleotide resolution genome-editing capability of this technology by creating a genome-wide gene disruption collection and apply our method to improve tolerance to growth inhibitors.

  7. A comprehensive crop genome research project: the Superhybrid Rice Genome Project in China.

    PubMed

    Yu, Jun; Wong, Gane Ka-Shu; Liu, Siqi; Wang, Jian; Yang, Huanming

    2007-06-29

    In May 2000, the Beijing Institute of Genomics formally announced the launch of a comprehensive crop genome research project on rice genomics, the Chinese Superhybrid Rice Genome Project. SRGP is not simply a sequencing project targeted to a single rice (Oryza sativa L.) genome, but a full-swing research effort with an ultimate goal of providing inclusive basic genomic information and molecular tools not only to understand biology of the rice, both as an important crop species and a model organism of cereals, but also to focus on a popular superhybrid rice landrace, LYP9. We have completed the first phase of SRGP and provide the rice research community with a finished genome sequence of an indica variety, 93-11 (the paternal cultivar of LYP9), together with ample data on subspecific (between subspecies) polymorphisms, transcriptomes and proteomes, useful for within-species comparative studies. In the second phase, we have acquired the genome sequence of the maternal cultivar, PA64S, together with the detailed catalogues of genes uniquely expressed in the parental cultivars and the hybrid as well as allele-specific markers that distinguish parental alleles. Although SRGP in China is not an open-ended research programme, it has been designed to pave a way for future plant genomics research and application, such as to interrogate fundamentals of plant biology, including genome duplication, polyploidy and hybrid vigour, as well as to provide genetic tools for crop breeding and to carry along a social burden-leading a fight against the world's hunger. It began with genomics, the newly developed and industry-scale research field, and from the world's most populous country. In this review, we summarize our scientific goals and noteworthy discoveries that exploit new territories of systematic investigations on basic and applied biology of rice and other major cereal crops.

  8. Transcriptome analysis of sika deer in China.

    PubMed

    Jia, Bo-Yin; Ba, Heng-Xing; Wang, Gui-Wu; Yang, Ying; Cui, Xue-Zhe; Peng, Ying-Hua; Zheng, Jun-Jun; Xing, Xiu-Mei; Yang, Fu-He

    2016-10-01

    Sika deer is of great commercial value because their antlers are used in tonics and alternative medicine and their meat is healthy and delicious. The goal of this study was to generate transcript sequences from sika deer for functional genomic analyses and to identify the transcripts that demonstrate tissue-specific, age-dependent differential expression patterns. These sequences could enhance our understanding of the molecular mechanisms underlying sika deer growth and development. In the present study, we performed de novo transcriptome assembly and profiling analysis across ten tissue types and four developmental stages (juvenile, adolescent, adult, and aged) of sika deer, using Illumina paired-end tag (PET) sequencing technology. A total of 1,752,253 contigs with an average length of 799 bp were generated, from which 1,348,618 unigenes with an average length of 590 bp were defined. Approximately 33.2 % of these (447,931 unigenes) were then annotated in public protein databases. Many sika deer tissue-specific, age-dependent unigenes were identified. The testes have the largest number of tissue-enriched unigenes, and some of them were prone to develop new functions for other tissues. Additionally, our transcriptome revealed that the juvenile-adolescent transition was the most complex and important stage of the sika deer life cycle. The present work represents the first multiple tissue transcriptome analysis of sika deer across four developmental stages. The generated data not only provide a functional genomics resource for future biological research on sika deer but also guide the selection and manipulation of genes controlling growth and development.

  9. Chætognath transcriptome reveals ancestral and unique features among bilaterians

    PubMed Central

    Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick

    2008-01-01

    Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022

  10. Genome-scale CRISPR-Cas9 Knockout and Transcriptional Activation Screening

    PubMed Central

    Joung, Julia; Konermann, Silvana; Gootenberg, Jonathan S.; Abudayyeh, Omar O.; Platt, Randall J.; Brigham, Mark D.; Sanjana, Neville E.; Zhang, Feng

    2017-01-01

    Forward genetic screens are powerful tools for the unbiased discovery and functional characterization of specific genetic elements associated with a phenotype of interest. Recently, the RNA-guided endonuclease Cas9 from the microbial CRISPR (clustered regularly interspaced short palindromic repeats) immune system has been adapted for genome-scale screening by combining Cas9 with pooled guide RNA libraries. Here we describe a protocol for genome-scale knockout and transcriptional activation screening using the CRISPR-Cas9 system. Custom- or ready-made guide RNA libraries are constructed and packaged into lentiviral vectors for delivery into cells for screening. As each screen is unique, we provide guidelines for determining screening parameters and maintaining sufficient coverage. To validate candidate genes identified from the screen, we further describe strategies for confirming the screening phenotype as well as genetic perturbation through analysis of indel rate and transcriptional activation. Beginning with library design, a genome-scale screen can be completed in 9–15 weeks followed by 4–5 weeks of validation. PMID:28333914

  11. ChlamyNET: a Chlamydomonas gene co-expression network reveals global properties of the transcriptome and the early setup of key co-expression patterns in the green lineage.

    PubMed

    Romero-Campero, Francisco J; Perez-Hurtado, Ignacio; Lucas-Reina, Eva; Romero, Jose M; Valverde, Federico

    2016-03-12

    Chlamydomonas reinhardtii is the model organism that serves as a reference for studies in algal genomics and physiology. It is of special interest in the study of the evolution of regulatory pathways from algae to higher plants. Additionally, it has recently gained attention as a potential source for bio-fuel and bio-hydrogen production. The genome of Chlamydomonas is available, facilitating the analysis of its transcriptome by RNA-seq data. This has produced a massive amount of data that remains fragmented making necessary the application of integrative approaches based on molecular systems biology. We constructed a gene co-expression network based on RNA-seq data and developed a web-based tool, ChlamyNET, for the exploration of the Chlamydomonas transcriptome. ChlamyNET exhibits a scale-free and small world topology. Applying clustering techniques, we identified nine gene clusters that capture the structure of the transcriptome under the analyzed conditions. One of the most central clusters was shown to be involved in carbon/nitrogen metabolism and signalling, whereas one of the most peripheral clusters was involved in DNA replication and cell cycle regulation. The transcription factors and regulators in the Chlamydomonas genome have been identified in ChlamyNET. The biological processes potentially regulated by them as well as their putative transcription factor binding sites were determined. The putative light regulated transcription factors and regulators in the Chlamydomonas genome were analyzed in order to provide a case study on the use of ChlamyNET. Finally, we used an independent data set to cross-validate the predictive power of ChlamyNET. The topological properties of ChlamyNET suggest that the Chlamydomonas transcriptome posseses important characteristics related to error tolerance, vulnerability and information propagation. The central part of ChlamyNET constitutes the core of the transcriptome where most authoritative hub genes are located

  12. De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

    PubMed Central

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689

  13. De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

    PubMed

    Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

    2013-01-01

    Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.

  14. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  15. The transcriptome of the novel dinoflagellate Oxyrrhis marina (Alveolata: Dinophyceae): response to salinity examined by 454 sequencing

    PubMed Central

    2011-01-01

    Background The heterotrophic dinoflagellate Oxyrrhis marina is increasingly studied in experimental, ecological and evolutionary contexts. Its basal phylogenetic position within the dinoflagellates make O. marina useful for understanding the origin of numerous unusual features of the dinoflagellate lineage; its broad distribution has lent O. marina to the study of protist biogeography; and nutritive flexibility and eurytopy have made it a common lab rat for the investigation of physiological responses of marine heterotrophic flagellates. Nevertheless, genome-scale resources for O. marina are scarce. Here we present a 454-based transcriptome survey for this organism. In addition, we assess sequence read abundance, as a proxy for gene expression, in response to salinity, an environmental factor potentially important in determining O. marina spatial distributions. Results Sequencing generated ~57 Mbp of data which assembled into 7, 398 contigs. Approximately 24% of contigs were nominally identified by BLAST. A further clustering of contigs (at ≥ 90% identity) revealed 164 transcript variant clusters, the largest of which (Phosphoribosylaminoimidazole-succinocarboxamide synthase) was composed of 28 variants displaying predominately synonymous variation. In a genomic context, a sample of 5 different genes were demonstrated to occur as tandem repeats, separated by short (~200-340 bp) inter-genic regions. For HSP90 several intergenic variants were detected suggesting a potentially complex genomic arrangement. In response to salinity, analysis of 454 read abundance highlighted 9 and 20 genes over or under expressed at 50 PSU, respectively. However, 454 read abundance and subsequent qPCR validation did not correlate well - suggesting that measures of gene expression via ad hoc analysis of sequence read abundance require careful interpretation. Conclusion Here we indicate that tandem gene arrangements and the occurrence of multiple transcribed gene variants are common and

  16. Rapid Prototyping of Microbial Cell Factories via Genome-scale Engineering

    PubMed Central

    Si, Tong; Xiao, Han; Zhao, Huimin

    2014-01-01

    Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. PMID:25450192

  17. Genomic and transcriptomic analyses reveal distinct biological functions for cold shock proteins (VpaCspA and VpaCspD) in Vibrio parahaemolyticus CHN25 during low-temperature survival.

    PubMed

    Zhu, Chunhua; Sun, Boyi; Liu, Taigang; Zheng, Huajun; Gu, Wenyi; He, Wei; Sun, Fengjiao; Wang, Yaping; Yang, Meicheng; Bei, Weicheng; Peng, Xu; She, Qunxin; Xie, Lu; Chen, Lanming

    2017-06-05

    Vibrio parahaemolyticus causes serious seafood-borne gastroenteritis and death in humans. Raw seafood is often subjected to post-harvest processing and low-temperature storage. To date, very little information is available regarding the biological functions of cold shock proteins (CSPs) in the low-temperature survival of the bacterium. In this study, we determined the complete genome sequence of V. parahaemolyticus CHN25 (serotype: O5:KUT). The two main CSP-encoding genes (VpacspA and VpacspD) were deleted from the bacterial genome, and comparative transcriptomic analysis between the mutant and wild-type strains was performed to dissect the possible molecular mechanisms that underlie low-temperature adaptation by V. parahaemolyticus. The 5,443,401-bp V. parahaemolyticus CHN25 genome (45.2% G + C) consisted of two circular chromosomes and three plasmids with 4,724 predicted protein-encoding genes. One dual-gene and two single-gene deletion mutants were generated for VpacspA and VpacspD by homologous recombination. The growth of the ΔVpacspA mutant was strongly inhibited at 10 °C, whereas the VpacspD gene deletion strongly stimulated bacterial growth at this low temperature compared with the wild-type strain. The complementary phenotypes were observed in the reverse mutants (ΔVpacspA-com, and ΔVpacspD-com). The transcriptome data revealed that 12.4% of the expressed genes in V. parahaemolyticus CHN25 were significantly altered in the ΔVpacspA mutant when it was grown at 10 °C. These included genes that were involved in amino acid degradation, secretion systems, sulphur metabolism and glycerophospholipid metabolism along with ATP-binding cassette transporters. However, a low temperature elicited significant expression changes for 10.0% of the genes in the ΔVpacspD mutant, including those involved in the phosphotransferase system and in the metabolism of nitrogen and amino acids. The major metabolic pathways that were altered by the dual-gene deletion

  18. Advances in Exercise, Fitness, and Performance Genomics in 2011

    PubMed Central

    Roth, Stephen M.; Rankinen, Tuomo; Hagberg, James M.; Loos, Ruth J. F.; Pérusse, Louis; Sarzynski, Mark A.; Wolfarth, Bernd; Bouchard, Claude

    2014-01-01

    This review of the exercise genomics literature emphasizes the highest quality papers published in 2011. Given this emphasis on the best publications, only a small number of published papers are reviewed. One study found that physical activity levels were significantly lower in patients with mitochondrial DNA mutations compared to controls. A two-stage fine mapping follow-up of a previous linkage peak found strong associations between sequence variation in the activin A receptor, type-1B (ACVR1B) gene and knee extensor strength, with rs2854464 emerging as the most promising candidate polymorphism. The association of higher muscular strength with the rs2854464 A-allele was confirmed in two separate cohorts. A study using a combination of transcriptomic and genomic data identified a comprehensive map of the transcriptomic features important for aerobic exercise training-induced improvements in maximal oxygen consumption, but no genetic variants derived from candidate transcripts were associated with trainability. A large-scale de novo meta-analysis confirmed that the effect of sequence variation in the fat mass and obesity-associated (FTO) gene on the risk of obesity differs between sedentary and physically active adults. Evidence for gene-physical activity interactions on type 2 diabetes risk was found in two separate studies. A large study of women found that physical activity modified the effect of polymorphisms in the lipoprotein lipase (LPL), hepatic lipase (LIPC), and cholesteryl ester transfer protein (CETP) genes, identified in previous genome-wide association study (GWAS) reports, on HDL-C. We conclude that a strong exercise genomics corpus of evidence would not only translate into powerful genomic predictors but would also have a major impact on exercise biology and exercise behavior research. PMID:22330029

  19. Short communication: development and characterization of novel transcriptome-derived microsatellites for genetic analysis of persimmon.

    PubMed

    Luo, C; Zhang, Q L; Luo, Z R

    2014-04-16

    Oriental persimmon (Diospyros kaki Thunb.) (2n = 6x = 90) is a major commercial and deciduous fruit tree that is believed to have originated in China. However, rare transcriptomic and genomic information on persimmon is available. Using Roche 454 sequencing technology, the transcriptome from RNA of the flowers of D. kaki was analyzed. A total of 1,250,893 reads were generated and 83,898 unigenes were assembled. A total of 42,711 SSR loci were identified from 23,494 unigenes and 289 polymerase chain reaction primer pairs were designed. Of these 289 primers, 155 (53.6%) showed robust PCR amplification and 98 revealed polymorphism between 15 persimmon genotypes, indicating a polymorphic rate of 63.23% of the productive primers for characterization and genotyping of the genus Diospyros. Transcriptome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no genomic sequence information available.

  20. AlgaGEM – a genome-scale metabolic reconstruction of algae based on the Chlamydomonas reinhardtii genome

    PubMed Central

    2011-01-01

    Background Microalgae have the potential to deliver biofuels without the associated competition for land resources. In order to realise the rates and titres necessary for commercial production, however, system-level metabolic engineering will be required. Genome scale metabolic reconstructions have revolutionized microbial metabolic engineering and are used routinely for in silico analysis and design. While genome scale metabolic reconstructions have been developed for many prokaryotes and model eukaryotes, the application to less well characterized eukaryotes such as algae is challenging not at least due to a lack of compartmentalization data. Results We have developed a genome-scale metabolic network model (named AlgaGEM) covering the metabolism for a compartmentalized algae cell based on the Chlamydomonas reinhardtii genome. AlgaGEM is a comprehensive literature-based genome scale metabolic reconstruction that accounts for the functions of 866 unique ORFs, 1862 metabolites, 2249 gene-enzyme-reaction-association entries, and 1725 unique reactions. The reconstruction was compartmentalized into the cytoplasm, mitochondrion, plastid and microbody using available data for algae complemented with compartmentalisation data for Arabidopsis thaliana. AlgaGEM describes a functional primary metabolism of Chlamydomonas and significantly predicts distinct algal behaviours such as the catabolism or secretion rather than recycling of phosphoglycolate in photorespiration. AlgaGEM was validated through the simulation of growth and algae metabolic functions inferred from literature. Using efficient resource utilisation as the optimality criterion, AlgaGEM predicted observed metabolic effects under autotrophic, heterotrophic and mixotrophic conditions. AlgaGEM predicts increased hydrogen production when cyclic electron flow is disrupted as seen in a high producing mutant derived from mutational studies. The model also predicted the physiological pathway for H2 production and

  1. The coffee genome hub: a resource for coffee genomes

    PubMed Central

    Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan

    2015-01-01

    The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413

  2. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  3. Plant functional genomics

    NASA Astrophysics Data System (ADS)

    Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

    2002-04-01

    Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

  4. Transcriptome sequences resolve deep relationships of the grape family.

    PubMed

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M; Gerrath, Jean; Zimmer, Elizabeth A; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated.

  5. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian

    PubMed Central

    2014-01-01

    Background The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. Description We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215–364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. Conclusions The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics

  6. Production of a reference transcriptome and transcriptomic database (EdwardsiellaBase) for the lined sea anemone, Edwardsiella lineata, a parasitic cnidarian.

    PubMed

    Stefanik, Derek J; Lubinski, Tristan J; Granger, Brian R; Byrd, Allyson L; Reitzel, Adam M; DeFilippo, Lukas; Lorenc, Allison; Finnerty, John R

    2014-01-28

    The lined sea anemone Edwardsiella lineata is an informative model system for evolutionary-developmental studies of parasitism. In this species, it is possible to compare alternate developmental pathways leading from a larva to either a free-living polyp or a vermiform parasite that inhabits the mesoglea of a ctenophore host. Additionally, E. lineata is confamilial with the model cnidarian Nematostella vectensis, providing an opportunity for comparative genomic, molecular and organismal studies. We generated a reference transcriptome for E. lineata via high-throughput sequencing of RNA isolated from five developmental stages (parasite; parasite-to-larva transition; larva; larva-to-adult transition; adult). The transcriptome comprises 90,440 contigs assembled from >15 billion nucleotides of DNA sequence. Using a molecular clock approach, we estimated the divergence between E. lineata and N. vectensis at 215-364 million years ago. Based on gene ontology and metabolic pathway analyses and gene family surveys (bHLH-PAS, deiodinases, Fox genes, LIM homeodomains, minicollagens, nuclear receptors, Sox genes, and Wnts), the transcriptome of E. lineata is comparable in depth and completeness to N. vectensis. Analyses of protein motifs and revealed extensive conservation between the proteins of these two edwardsiid anemones, although we show the NF-κB protein of E. lineata reflects the ancestral structure, while the NF-κB protein of N. vectensis has undergone a split that separates the DNA-binding domain from the inhibitory domain. All contigs have been deposited in a public database (EdwardsiellaBase), where they may be searched according to contig ID, gene ontology, protein family motif (Pfam), enzyme commission number, and BLAST. The alignment of the raw reads to the contigs can also be visualized via JBrowse. The transcriptomic data and database described here provide a platform for studying the evolutionary developmental genomics of a derived parasitic life cycle. In

  7. Ocean biogeochemistry modeled with emergent trait-based genomics

    NASA Astrophysics Data System (ADS)

    Coles, V. J.; Stukel, M. R.; Brooks, M. T.; Burd, A.; Crump, B. C.; Moran, M. A.; Paul, J. H.; Satinsky, B. M.; Yager, P. L.; Zielinski, B. L.; Hood, R. R.

    2017-12-01

    Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and “omics” data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean.

  8. Genome-wide transcriptomic analysis of BR-deficient Micro-Tom reveals correlations between drought stress tolerance and brassinosteroid signaling in tomato.

    PubMed

    Lee, Jinsu; Shim, Donghwan; Moon, Suyun; Kim, Hyemin; Bae, Wonsil; Kim, Kyunghwan; Kim, Yang-Hoon; Rhee, Sung-Keun; Hong, Chang Pyo; Hong, Suk-Young; Lee, Ye-Jin; Sung, Jwakyung; Ryu, Hojin

    2018-06-01

    Brassinosteroids (BRs) are plant steroid hormones that play crucial roles in a range of growth and developmental processes. Although BR signal transduction and biosynthetic pathways have been well characterized in model plants, their biological roles in an important crop, tomato (Solanum lycopersicum), remain unknown. Here, cultivated tomato (WT) and a BR synthesis mutant, Micro-Tom (MT), were compared using physiological and transcriptomic approaches. The cultivated tomato showed higher tolerance to drought and osmotic stresses than the MT tomato. However, BR-defective phenotypes of MT, including plant growth and stomatal closure defects, were completely recovered by application of exogenous BR or complementation with a SlDWARF gene. Using genome-wide transcriptome analysis, 619 significantly differentially expressed genes (DEGs) were identified between WT and MT plants. Several DEGs were linked to known signaling networks, including those related to biotic/abiotic stress responses, lignification, cell wall development, and hormone responses. Consistent with the higher susceptibility of MT to drought stress, several gene sets involved in responses to drought and osmotic stress were differentially regulated between the WT and MT tomato plants. Our data suggest that BR signaling pathways are involved in mediating the response to abiotic stress via fine-tuning of abiotic stress-related gene networks in tomato plants. Copyright © 2018. Published by Elsevier Masson SAS.

  9. Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Catfish Genome Consortium; Wang, Shaolin; Peatman, Eric

    2010-03-23

    Background-Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. Results-A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35percent of the unique sequences had significant similarities tomore » known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. Conclusions-This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.« less

  10. Principle considerations for the use of transcriptomics in doping research.

    PubMed

    Neuberger, Elmo W I; Moser, Dirk A; Simon, Perikles

    2011-10-01

    Over the course of the past decade, technical progress has enabled scientists to investigate genome-wide RNA expression using microarray platforms. This transcriptomic approach represents a promising tool for the discovery of basic gene expression patterns and for identification of cellular signalling pathways under various conditions. Since doping substances have been shown to influence mRNA expression, it has been suggested that these changes can be detected by screening the blood transcriptome. In this review, we critically discuss the potential but also the pitfalls of this application as a tool in doping research. Transcriptomic approaches were considered to potentially provide researchers with a unique gene expression signature or with a specific biomarker for various physiological and pathophysiological conditions. Since transcriptomic approaches are considerably prone to biological and technical confounding factors that act on study subjects or samples, very strict guidelines for the use of transcriptomics in human study subjects have been developed. Typical field conditions associated with doping controls limit the feasibility of following these strict guidelines as there are too many variables counteracting a standardized procedure. After almost a decade of research using transcriptomic tools, it still remains a matter of future technological progress to identify the ultimate biomarker using technologies and/or methodologies that are sufficiently robust against typical biological and technical bias and that are valid in a court of law. Copyright © 2011 John Wiley & Sons, Ltd.

  11. T-IDBA: A de novo Iterative de Bruijn Graph Assembler for Transcriptome

    NASA Astrophysics Data System (ADS)

    Peng, Yu; Leung, Henry C. M.; Yiu, S. M.; Chin, Francis Y. L.

    RNA-seq data produced by next-generation sequencing technology is a useful tool for analyzing transcriptomes. However, existing de novo transcriptome assemblers do not fully utilize the properties of transcriptomes and may result in short contigs because of the splicing nature (shared exons) of the genes. We propose the T-IDBA algorithm to reconstruct expressed isoforms without reference genome. By using pair-end information to solve the problem of long repeats in different genes and branching in the same gene due to alternative splicing, the graph can be decomposed into small components, each corresponds to a gene. The most possible isoforms with sufficient support from the pair-end reads will be found heuristically. In practice, our de novo transcriptome assembler, T-IDBA, outperforms Abyss substantially in terms of sensitivity and precision for both simulated and real data. T-IDBA is available at http://www.cs.hku.hk/~alse/tidba/

  12. Genomics and transcriptomics in drug discovery.

    PubMed

    Dopazo, Joaquin

    2014-02-01

    The popularization of genomic high-throughput technologies is causing a revolution in biomedical research and, particularly, is transforming the field of drug discovery. Systems biology offers a framework to understand the extensive human genetic heterogeneity revealed by genomic sequencing in the context of the network of functional, regulatory and physical protein-drug interactions. Thus, approaches to find biomarkers and therapeutic targets will have to take into account the complex system nature of the relationships of the proteins with the disease. Pharmaceutical companies will have to reorient their drug discovery strategies considering the human genetic heterogeneity. Consequently, modeling and computational data analysis will have an increasingly important role in drug discovery. Copyright © 2013 Elsevier Ltd. All rights reserved.

  13. Genome, transcriptome and methylome sequencing of a primitively eusocial wasp reveal a greatly reduced DNA methylation system in a social insect.

    PubMed

    Standage, Daniel S; Berens, Ali J; Glastad, Karl M; Severin, Andrew J; Brendel, Volker P; Toth, Amy L

    2016-04-01

    Comparative genomics of social insects has been intensely pursued in recent years with the goal of providing insights into the evolution of social behaviour and its underlying genomic and epigenomic basis. However, the comparative approach has been hampered by a paucity of data on some of the most informative social forms (e.g. incipiently and primitively social) and taxa (especially members of the wasp family Vespidae) for studying social evolution. Here, we provide a draft genome of the primitively eusocial model insect Polistes dominula, accompanied by analysis of caste-related transcriptome and methylome sequence data for adult queens and workers. Polistes dominula possesses a fairly typical hymenopteran genome, but shows very low genomewide GC content and some evidence of reduced genome size. We found numerous caste-related differences in gene expression, with evidence that both conserved and novel genes are related to caste differences. Most strikingly, these -omics data reveal a major reduction in one of the major epigenetic mechanisms that has been previously suggested to be important for caste differences in social insects: DNA methylation. Along with a conspicuous loss of a key gene associated with environmentally responsive DNA methylation (the de novo DNA methyltransferase Dnmt3), these wasps have greatly reduced genomewide methylation to almost zero. In addition to providing a valuable resource for comparative analysis of social insect evolution, our integrative -omics data for this important behavioural and evolutionary model system call into question the general importance of DNA methylation in caste differences and evolution in social insects. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  14. The draft genome of tropical fruit durian (Durio zibethinus).

    PubMed

    Teh, Bin Tean; Lim, Kevin; Yong, Chern Han; Ng, Cedric Chuan Young; Rao, Sushma Ramesh; Rajasegaran, Vikneswari; Lim, Weng Khong; Ong, Choon Kiat; Chan, Ki; Cheng, Vincent Kin Yuen; Soh, Poh Sheng; Swarup, Sanjay; Rozen, Steven G; Nagarajan, Niranjan; Tan, Patrick

    2017-11-01

    Durian (Durio zibethinus) is a Southeast Asian tropical plant known for its hefty, spine-covered fruit and sulfury and onion-like odor. Here we present a draft genome assembly of D. zibethinus, representing the third plant genus in the Malvales order and first in the Helicteroideae subfamily to be sequenced. Single-molecule sequencing and chromosome contact maps enabled assembly of the highly heterozygous durian genome at chromosome-scale resolution. Transcriptomic analysis showed upregulation of sulfur-, ethylene-, and lipid-related pathways in durian fruits. We observed paleopolyploidization events shared by durian and cotton and durian-specific gene expansions in MGL (methionine γ-lyase), associated with production of volatile sulfur compounds (VSCs). MGL and the ethylene-related gene ACS (aminocyclopropane-1-carboxylic acid synthase) were upregulated in fruits concomitantly with their downstream metabolites (VSCs and ethylene), suggesting a potential association between ethylene biosynthesis and methionine regeneration via the Yang cycle. The durian genome provides a resource for tropical fruit biology and agronomy.

  15. Characterization and analysis of a transcriptome from the boreal spider crab Hyas araneus.

    PubMed

    Harms, Lars; Frickenhaus, Stephan; Schiffer, Melanie; Mark, Felix C; Storch, Daniela; Pörtner, Hans-Otto; Held, Christoph; Lucassen, Magnus

    2013-12-01

    Research investigating the genetic basis of physiological responses has significantly broadened our understanding of the mechanisms underlying organismic response to environmental change. However, genomic data are currently available for few taxa only, thus excluding physiological model species from this approach. In this study we report the transcriptome of the model organism Hyas araneus from Spitsbergen (Arctic). We generated 20,479 transcripts, using the 454 GS FLX sequencing technology in combination with an Illumina HiSeq sequencing approach. Annotation by Blastx revealed 7159 blast hits in the NCBI non-redundant protein database. The comparison between the spider crab H. araneus transcriptome and EST libraries of the European lobster Homarus americanus and the porcelain crab Petrolisthes cinctipes yielded 3229/2581 sequences with a significant hit, respectively. The clustering by the Markov Clustering Algorithm (MCL) revealed a common core of 1710 clusters present in all three species and 5903 unique clusters for H. araneus. The combined sequencing approaches generated transcripts that will greatly expand the limited genomic data available for crustaceans. We introduce the MCL clustering for transcriptome comparisons as a simple approach to estimate similarities between transcriptomic libraries of different size and quality and to analyze homologies within the selected group of species. In particular, we identified a large variety of reverse transcriptase (RT) sequences not only in the H. araneus transcriptome and other decapod crustaceans, but also sea urchin, supporting the hypothesis of a heritable, anti-viral immunity and the proposed viral fragment integration by host-derived RTs in marine invertebrates. © 2013.

  16. New Markers for Predicting Fertility of the Male Gametes in the Post Genomic Age.

    PubMed

    Dipresa, Savina; De Toni, Luca; Foresta, Carlo; Garolla, Andrea

    2018-04-18

    A number of test have been proposed to assess male fertility potential, ranging from routine testing by light microscopic method for evaluating semen samples, to screening test for DNA integrity aimed to look at sperm chromatin abnormalities. Spermatozoa are an extremely differentiated cell, they have critical functions for embryo development and heredity, in addiction to delivering a haploid paternal genome to the oocyte. Towards this goal certain requirements must always be met. The ability of spermatozoa to perform its reproductive function taking place in the spermatogenesis, a highly specialized process depending on multiple factors with effect on male fertility. In the past 30 years, large-scale analyses of transcriptomic and genome expression in mammals have generated a large amount of informations on numberless biomolecules involved in spermatogenesis and male germ cell reproductive function. Sperm proteome represents the protein content that spermatozoa needs to survive and work correctly and modifications of sperm proteome play a role in determining functional changes leading to a decrease of reproductive competence into affected spermatozoa. The post-genomic approach consists of different methodologies for concurrently testicular transcriptome studies, protein compositional analysis and metabolomics findings of the spermatozoa in humans. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  17. Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.).

    PubMed

    Dubey, Anuja; Farmer, Andrew; Schlueter, Jessica; Cannon, Steven B; Abernathy, Brian; Tuteja, Reetu; Woodward, Jimmy; Shah, Trushar; Mulasmanovic, Benjamin; Kudapa, Himabindu; Raju, Nikku L; Gothalwal, Ragini; Pande, Suresh; Xiao, Yongli; Town, Chris D; Singh, Nagendra K; May, Gregory D; Jackson, Scott; Varshney, Rajeev K

    2011-06-01

    This study reports generation of large-scale genomic resources for pigeonpea, a so-called 'orphan crop species' of the semi-arid tropic regions. FLX/454 sequencing carried out on a normalized cDNA pool prepared from 31 tissues produced 494 353 short transcript reads (STRs). Cluster analysis of these STRs, together with 10 817 Sanger ESTs, resulted in a pigeonpea trancriptome assembly (CcTA) comprising of 127 754 tentative unique sequences (TUSs). Functional analysis of these TUSs highlights several active pathways and processes in the sampled tissues. Comparison of the CcTA with the soybean genome showed similarity to 10 857 and 16 367 soybean gene models (depending on alignment methods). Additionally, Illumina 1G sequencing was performed on Fusarium wilt (FW)- and sterility mosaic disease (SMD)-challenged root tissues of 10 resistant and susceptible genotypes. More than 160 million sequence tags were used to identify FW- and SMD-responsive genes. Sequence analysis of CcTA and the Illumina tags identified a large new set of markers for use in genetics and breeding, including 8137 simple sequence repeats, 12 141 single-nucleotide polymorphisms and 5845 intron-spanning regions. Genomic resources developed in this study should be useful for basic and applied research, not only for pigeonpea improvement but also for other related, agronomically important legumes.

  18. Genome-scale approaches to the epigenetics of common human disease

    PubMed Central

    2011-01-01

    Traditionally, the pathology of human disease has been focused on microscopic examination of affected tissues, chemical and biochemical analysis of biopsy samples, other available samples of convenience, such as blood, and noninvasive or invasive imaging of varying complexity, in order to classify disease and illuminate its mechanistic basis. The molecular age has complemented this armamentarium with gene expression arrays and selective analysis of individual genes. However, we are entering a new era of epigenomic profiling, i.e., genome-scale analysis of cell-heritable nonsequence genetic change, such as DNA methylation. The epigenome offers access to stable measurements of cellular state and to biobanked material for large-scale epidemiological studies. Some of these genome-scale technologies are beginning to be applied to create the new field of epigenetic epidemiology. PMID:19844740

  19. Transcriptome profiling analysis of cultivar-specific apple fruit ripening and texture attributes

    USDA-ARS?s Scientific Manuscript database

    Molecular events regulating cultivar-specific apple fruit ripening and sensory quality are largely unknown. Such knowledge is essential for genomic-assisted apple breeding and postharvest quality management. In this study, transcriptome profile analysis, scanning electron microscopic examination an...

  20. rnaQUAST: a quality assessment tool for de novo transcriptome assemblies.

    PubMed

    Bushmanova, Elena; Antipov, Dmitry; Lapidus, Alla; Suvorov, Vladimir; Prjibelski, Andrey D

    2016-07-15

    Ability to generate large RNA-Seq datasets created a demand for both de novo and reference-based transcriptome assemblers. However, while many transcriptome assemblers are now available, there is still no unified quality assessment tool for RNA-Seq assemblies. We present rnaQUAST-a tool for evaluating RNA-Seq assembly quality and benchmarking transcriptome assemblers using reference genome and gene database. rnaQUAST calculates various metrics that demonstrate completeness and correctness levels of the assembled transcripts, and outputs them in a user-friendly report. rnaQUAST is implemented in Python and is freely available at http://bioinf.spbau.ru/en/rnaquast ap@bioinf.spbau.ru Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. De novo Assembly and Transcriptomic Profiling of the Grazing Response in Stipa grandis

    PubMed Central

    Hou, Xiangyang; Ren, Weibo; Ding, Yong; Sa, Rula

    2015-01-01

    Background Stipa grandis (Poaceae) is one of the dominant species in a typical steppe of the Inner Mongolian Plateau. However, primarily due to heavy grazing, the grasslands have become seriously degraded, and S. grandis has developed a special growth-inhibition phenotype against the stressful habitat. Because of the lack of transcriptomic and genomic information, the understanding of the molecular mechanisms underlying the grazing response of S. grandis has been prohibited. Results Using the Illumina HiSeq 2000 platform, two libraries prepared from non-grazing (FS) and overgrazing samples (OS) were sequenced. De novo assembly produced 94,674 unigenes, of which 65,047 unigenes had BLAST hits in the National Center for Biotechnology Information (NCBI) non-redundant (nr) database (E-value < 10-5). In total, 47,747, 26,156 and 40,842 unigenes were assigned to the Gene Ontology (GO), Clusters of Orthologous Group (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases, respectively. A total of 13,221 unigenes showed significant differences in expression under the overgrazing condition, with a threshold false discovery rate ≤ 0.001 and an absolute value of log2Ratio ≥ 1. These differentially expressed genes (DEGs) were assigned to 43,257 GO terms and were significantly enriched in 32 KEGG pathways (q-value ≤ 0.05). The alterations in the wound-, drought- and defense-related genes indicate that stressors have an additive effect on the growth inhibition of this species. Conclusions This first large-scale transcriptome study will provide important information for further gene expression and functional genomics studies, and it facilitated our investigation of the molecular mechanisms of the S. grandis grazing response and the associated morphological and physiological characteristics. PMID:25875617

  2. Joint scaling laws in functional and evolutionary categories in prokaryotic genomes

    PubMed Central

    Grilli, J.; Bassetti, B.; Maslov, S.; Cosentino Lagomarsino, M.

    2012-01-01

    We propose and study a class-expansion/innovation/loss model of genome evolution taking into account biological roles of genes and their constituent domains. In our model, numbers of genes in different functional categories are coupled to each other. For example, an increase in the number of metabolic enzymes in a genome is usually accompanied by addition of new transcription factors regulating these enzymes. Such coupling can be thought of as a proportional ‘recipe’ for genome composition of the type ‘a spoonful of sugar for each egg yolk’. The model jointly reproduces two known empirical laws: the distribution of family sizes and the non-linear scaling of the number of genes in certain functional categories (e.g. transcription factors) with genome size. In addition, it allows us to derive a novel relation between the exponents characterizing these two scaling laws, establishing a direct quantitative connection between evolutionary and functional categories. It predicts that functional categories that grow faster-than-linearly with genome size to be characterized by flatter-than-average family size distributions. This relation is confirmed by our bioinformatics analysis of prokaryotic genomes. This proves that the joint quantitative trends of functional and evolutionary classes can be understood in terms of evolutionary growth with proportional recipes. PMID:21937509

  3. Genome-Wide Identification and Transcriptome-Based Expression Profiling of the Sox Gene Family in the Nile Tilapia (Oreochromis niloticus)

    PubMed Central

    Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou

    2016-01-01

    The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts. PMID:26907269

  4. Genome-Wide Identification and Transcriptome-Based Expression Profiling of the Sox Gene Family in the Nile Tilapia (Oreochromis niloticus).

    PubMed

    Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou

    2016-02-23

    The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts.

  5. Genome-scale rates of evolutionary change in bacteria

    PubMed Central

    Duchêne, Sebastian; Holt, Kathryn E.; Weill, François-Xavier; Le Hello, Simon; Hawkey, Jane; Edwards, David J.; Fourment, Mathieu

    2016-01-01

    Estimating the rates at which bacterial genomes evolve is critical to understanding major evolutionary and ecological processes such as disease emergence, long-term host–pathogen associations and short-term transmission patterns. The surge in bacterial genomic data sets provides a new opportunity to estimate these rates and reveal the factors that shape bacterial evolutionary dynamics. For many organisms estimates of evolutionary rate display an inverse association with the time-scale over which the data are sampled. However, this relationship remains unexplored in bacteria due to the difficulty in estimating genome-wide evolutionary rates, which are impacted by the extent of temporal structure in the data and the prevalence of recombination. We collected 36 whole genome sequence data sets from 16 species of bacterial pathogens to systematically estimate and compare their evolutionary rates and assess the extent of temporal structure in the absence of recombination. The majority (28/36) of data sets possessed sufficient clock-like structure to robustly estimate evolutionary rates. However, in some species reliable estimates were not possible even with ‘ancient DNA’ data sampled over many centuries, suggesting that they evolve very slowly or that they display extensive rate variation among lineages. The robustly estimated evolutionary rates spanned several orders of magnitude, from approximately 10−5 to 10−8 nucleotide substitutions per site year−1. This variation was negatively associated with sampling time, with this relationship best described by an exponential decay curve. To avoid potential estimation biases, such time-dependency should be considered when inferring evolutionary time-scales in bacteria. PMID:28348834

  6. Integrated analyses using RNA-Seq data reveal viral genomes, single nucleotide variations, the phylogenetic relationship, and recombination for Apple stem grooving virus.

    PubMed

    Jo, Yeonhwa; Choi, Hoseong; Kim, Sang-Min; Kim, Sun-Lim; Lee, Bong Choon; Cho, Won Kyong

    2016-08-09

    Next-generation sequencing (NGS) provides many possibilities for plant virology research. In this study, we performed integrated analyses using plant transcriptome data for plant virus identification using Apple stem grooving virus (ASGV) as an exemplar virus. We used 15 publicly available transcriptome libraries from three different studies, two mRNA-Seq studies and a small RNA-Seq study. We de novo assembled nearly complete genomes of ASGV isolates Fuji and Cuiguan from apple and pear transcriptomes, respectively, and identified single nucleotide variations (SNVs) of ASGV within the transcriptomes. We demonstrated the application of NGS raw data to confirm viral infections in the plant transcriptomes. In addition, we compared the usability of two de novo assemblers, Trinity and Velvet, for virus identification and genome assembly. A phylogenetic tree revealed that ASGV and Citrus tatter leaf virus (CTLV) are the same virus, which was divided into two clades. Recombination analyses identified six recombination events from 21 viral genomes. Taken together, our in silico analyses using NGS data provide a successful application of plant transcriptomes to reveal extensive information associated with viral genome assembly, SNVs, phylogenetic relationships, and genetic recombination.

  7. Functional Genomics Approaches to Studying Symbioses between Legumes and Nitrogen-Fixing Rhizobia.

    PubMed

    Lardi, Martina; Pessi, Gabriella

    2018-05-18

    Biological nitrogen fixation gives legumes a pronounced growth advantage in nitrogen-deprived soils and is of considerable ecological and economic interest. In exchange for reduced atmospheric nitrogen, typically given to the plant in the form of amides or ureides, the legume provides nitrogen-fixing rhizobia with nutrients and highly specialised root structures called nodules. To elucidate the molecular basis underlying physiological adaptations on a genome-wide scale, functional genomics approaches, such as transcriptomics, proteomics, and metabolomics, have been used. This review presents an overview of the different functional genomics approaches that have been performed on rhizobial symbiosis, with a focus on studies investigating the molecular mechanisms used by the bacterial partner to interact with the legume. While rhizobia belonging to the alpha-proteobacterial group (alpha-rhizobia) have been well studied, few studies to date have investigated this process in beta-proteobacteria (beta-rhizobia).

  8. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

    PubMed

    Gerstein, Mark B; Lu, Zhi John; Van Nostrand, Eric L; Cheng, Chao; Arshinoff, Bradley I; Liu, Tao; Yip, Kevin Y; Robilotto, Rebecca; Rechtsteiner, Andreas; Ikegami, Kohta; Alves, Pedro; Chateigner, Aurelien; Perry, Marc; Morris, Mitzi; Auerbach, Raymond K; Feng, Xin; Leng, Jing; Vielle, Anne; Niu, Wei; Rhrissorrakrai, Kahn; Agarwal, Ashish; Alexander, Roger P; Barber, Galt; Brdlik, Cathleen M; Brennan, Jennifer; Brouillet, Jeremy Jean; Carr, Adrian; Cheung, Ming-Sin; Clawson, Hiram; Contrino, Sergio; Dannenberg, Luke O; Dernburg, Abby F; Desai, Arshad; Dick, Lindsay; Dosé, Andréa C; Du, Jiang; Egelhofer, Thea; Ercan, Sevinc; Euskirchen, Ghia; Ewing, Brent; Feingold, Elise A; Gassmann, Reto; Good, Peter J; Green, Phil; Gullier, Francois; Gutwein, Michelle; Guyer, Mark S; Habegger, Lukas; Han, Ting; Henikoff, Jorja G; Henz, Stefan R; Hinrichs, Angie; Holster, Heather; Hyman, Tony; Iniguez, A Leo; Janette, Judith; Jensen, Morten; Kato, Masaomi; Kent, W James; Kephart, Ellen; Khivansara, Vishal; Khurana, Ekta; Kim, John K; Kolasinska-Zwierz, Paulina; Lai, Eric C; Latorre, Isabel; Leahey, Amber; Lewis, Suzanna; Lloyd, Paul; Lochovsky, Lucas; Lowdon, Rebecca F; Lubling, Yaniv; Lyne, Rachel; MacCoss, Michael; Mackowiak, Sebastian D; Mangone, Marco; McKay, Sheldon; Mecenas, Desirea; Merrihew, Gennifer; Miller, David M; Muroyama, Andrew; Murray, John I; Ooi, Siew-Loon; Pham, Hoang; Phippen, Taryn; Preston, Elicia A; Rajewsky, Nikolaus; Rätsch, Gunnar; Rosenbaum, Heidi; Rozowsky, Joel; Rutherford, Kim; Ruzanov, Peter; Sarov, Mihail; Sasidharan, Rajkumar; Sboner, Andrea; Scheid, Paul; Segal, Eran; Shin, Hyunjin; Shou, Chong; Slack, Frank J; Slightam, Cindie; Smith, Richard; Spencer, William C; Stinson, E O; Taing, Scott; Takasaki, Teruaki; Vafeados, Dionne; Voronina, Ksenia; Wang, Guilin; Washington, Nicole L; Whittle, Christina M; Wu, Beijing; Yan, Koon-Kiu; Zeller, Georg; Zha, Zheng; Zhong, Mei; Zhou, Xingliang; Ahringer, Julie; Strome, Susan; Gunsalus, Kristin C; Micklem, Gos; Liu, X Shirley; Reinke, Valerie; Kim, Stuart K; Hillier, LaDeana W; Henikoff, Steven; Piano, Fabio; Snyder, Michael; Stein, Lincoln; Lieb, Jason D; Waterston, Robert H

    2010-12-24

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

  9. Genome-wide transcriptome analysis of the transition from primary to secondary stem development in Populus trichocarpa

    PubMed Central

    2010-01-01

    Background With its genome sequence and other experimental attributes, Populus trichocarpa has become the model species for genomic studies of wood development. Wood is derived from secondary growth of tree stems, and begins with the development of a ring of vascular cambium in the young developing stem. The terminal region of the developing shoot provides a steep developmental gradient from primary to secondary growth that facilitates identification of genes that play specialized functions during each of these phases of growth. Results Using a genomic microarray representing the majority of the transcriptome, we profiled gene expression in stem segments that spanned primary to secondary growth. We found 3,016 genes that were differentially expressed during stem development (Q-value ≤ 0.05; >2-fold expression variation), and 15% of these genes encode proteins with no significant identities to known genes. We identified all gene family members putatively involved in secondary growth for carbohydrate active enzymes, tubulins, actins, actin depolymerizing factors, fasciclin-like AGPs, and vascular development-associated transcription factors. Almost 70% of expressed transcription factors were upregulated during the transition to secondary growth. The primary shoot elongation region of the stem contained specific carbohydrate active enzyme and expansin family members that are likely to function in primary cell wall synthesis and modification. Genes involved in plant defense and protective functions were also dominant in the primary growth region. Conclusion Our results describe the global patterns of gene expression that occur during the transition from primary to secondary stem growth. We were able to identify three major patterns of gene expression and over-represented gene ontology categories during stem development. The new regulatory factors and cell wall biogenesis genes that we identified provide candidate genes for further functional characterization, as well

  10. Rapid prototyping of microbial cell factories via genome-scale engineering.

    PubMed

    Si, Tong; Xiao, Han; Zhao, Huimin

    2015-11-15

    Advances in reading, writing and editing genetic materials have greatly expanded our ability to reprogram biological systems at the resolution of a single nucleotide and on the scale of a whole genome. Such capacity has greatly accelerated the cycles of design, build and test to engineer microbes for efficient synthesis of fuels, chemicals and drugs. In this review, we summarize the emerging technologies that have been applied, or are potentially useful for genome-scale engineering in microbial systems. We will focus on the development of high-throughput methodologies, which may accelerate the prototyping of microbial cell factories. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. Characterization of the genome and transcriptome of the blue tit Cyanistes caeruleus: polymorphisms, sex-biased expression and selection signals.

    PubMed

    Mueller, Jakob C; Kuhl, Heiner; Timmermann, Bernd; Kempenaers, Bart

    2016-03-01

    Decoding genomic sequences and determining their variation within populations has potential to reveal adaptive processes and unravel the genetic basis of ecologically relevant trait variation within a species. The blue tit Cyanistes caeruleus--a long-time ecological model species--has been used to investigate fitness consequences of variation in mating and reproductive behaviour. However, very little is known about the underlying genetic changes due to natural and sexual selection in the genome of this songbird. As a step to bridge this gap, we assembled the first draft genome of a single blue tit, mapped the transcriptome of five females and five males to this reference, identified genomewide variants and performed sex-differential expression analysis in the gonads, brain and other tissues. In the gonads, we found a high number of sex-biased genes, and of those, a similar proportion were sex-limited (genes only expressed in one sex) in males and females. However, in the brain, the proportion of female-limited genes within the female-biased gene category (82%) was substantially higher than the proportion of male-limited genes within the male-biased category (6%). This suggests a predominant on-off switching mechanism for the female-limited genes. In addition, most male-biased genes were located on the Z-chromosome, indicating incomplete dosage compensation for the male-biased genes. We called more than 500,000 SNPs from the RNA-seq data. Heterozygote detection in the single reference individual was highly congruent between DNA-seq and RNA-seq calling. Using information from these polymorphisms, we identified potential selection signals in the genome. We list candidate genes which can be used for further sequencing and detailed selection studies, including genes potentially related to meiotic drive evolution. A public genome browser of the blue tit with the described information is available at http://public-genomes-ngs.molgen.mpg.de. © 2015 John Wiley & Sons Ltd.

  12. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing.

    PubMed

    Chen, Shi-Yi; Deng, Feilong; Jia, Xianbo; Li, Cao; Lai, Song-Jia

    2017-08-09

    It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

  13. De novo assembly, characterization and functional annotation of pineapple fruit transcriptome through massively parallel sequencing.

    PubMed

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.

  14. De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

    PubMed Central

    Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

    2012-01-01

    Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603

  15. RNA-Seq Transcriptome Profiling of Upland Cotton (Gossypium hirsutum L.) Root Tissue under Water-Deficit Stress

    PubMed Central

    Bowman, Megan J.; Park, Wonkeun; Bauer, Philip J.; Udall, Joshua A.; Page, Justin T.; Raney, Joshua; Scheffler, Brian E.; Jones, Don. C.; Campbell, B. Todd

    2013-01-01

    An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs. PMID:24324815

  16. Examination of Triacylglycerol Biosynthetic Pathways via De Novo Transcriptomic and Proteomic Analyses in an Unsequenced Microalga

    PubMed Central

    Guarnieri, Michael T.; Nag, Ambarish; Smolinski, Sharon L.; Darzins, Al; Seibert, Michael; Pienkos, Philip T.

    2011-01-01

    Biofuels derived from algal lipids represent an opportunity to dramatically impact the global energy demand for transportation fuels. Systems biology analyses of oleaginous algae could greatly accelerate the commercialization of algal-derived biofuels by elucidating the key components involved in lipid productivity and leading to the initiation of hypothesis-driven strain-improvement strategies. However, higher-level systems biology analyses, such as transcriptomics and proteomics, are highly dependent upon available genomic sequence data, and the lack of these data has hindered the pursuit of such analyses for many oleaginous microalgae. In order to examine the triacylglycerol biosynthetic pathway in the unsequenced oleaginous microalga, Chlorella vulgaris, we have established a strategy with which to bypass the necessity for genomic sequence information by using the transcriptome as a guide. Our results indicate an upregulation of both fatty acid and triacylglycerol biosynthetic machinery under oil-accumulating conditions, and demonstrate the utility of a de novo assembled transcriptome as a search model for proteomic analysis of an unsequenced microalga. PMID:22043295

  17. Genome-scale metabolic modeling of responses to polymyxins in Pseudomonas aeruginosa.

    PubMed

    Zhu, Yan; Czauderna, Tobias; Zhao, Jinxin; Klapperstueck, Matthias; Maifiah, Mohd Hafidz Mahamad; Han, Mei-Ling; Lu, Jing; Sommer, Björn; Velkov, Tony; Lithgow, Trevor; Song, Jiangning; Schreiber, Falk; Li, Jian

    2018-04-01

    Pseudomonas aeruginosa often causes multidrug-resistant infections in immunocompromised patients, and polymyxins are often used as the last-line therapy. Alarmingly, resistance to polymyxins has been increasingly reported worldwide recently. To rescue this last-resort class of antibiotics, it is necessary to systematically understand how P. aeruginosa alters its metabolism in response to polymyxin treatment, thereby facilitating the development of effective therapies. To this end, a genome-scale metabolic model (GSMM) was used to analyze bacterial metabolic changes at the systems level. A high-quality GSMM iPAO1 was constructed for P. aeruginosa PAO1 for antimicrobial pharmacological research. Model iPAO1 encompasses an additional periplasmic compartment and contains 3022 metabolites, 4265 reactions, and 1458 genes in total. Growth prediction on 190 carbon and 95 nitrogen sources achieved an accuracy of 89.1%, outperforming all reported P. aeruginosa models. Notably, prediction of the essential genes for growth achieved a high accuracy of 87.9%. Metabolic simulation showed that lipid A modifications associated with polymyxin resistance exert a limited impact on bacterial growth and metabolism but remarkably change the physiochemical properties of the outer membrane. Modeling with transcriptomics constraints revealed a broad range of metabolic responses to polymyxin treatment, including reduced biomass synthesis, upregulated amino acid catabolism, induced flux through the tricarboxylic acid cycle, and increased redox turnover. Overall, iPAO1 represents the most comprehensive GSMM constructed to date for Pseudomonas. It provides a powerful systems pharmacology platform for the elucidation of complex killing mechanisms of antibiotics.

  18. Salt-Responsive Transcriptome Profiling of Suaeda glauca via RNA Sequencing

    PubMed Central

    Jin, Hangxia; Dong, Dekun; Yang, Qinghua; Zhu, Danhua

    2016-01-01

    Background Suaeda glauca, a succulent halophyte of the Chenopodiaceae family, is widely distributed in coastal areas of China. Suaeda glauca is highly resistant to salt and alkali stresses. In the present study, the salt-responsive transcriptome of Suaeda glauca was analyzed to identify genes involved in salt tolerance and study halophilic mechanisms in this halophyte. Results Illumina HiSeq 2500 was used to sequence cDNA libraries from salt-treated and control samples with three replicates each treatment. De novo assembly of the six transcriptomes identified 75,445 unigenes. A total of 23,901 (31.68%) unigenes were annotated. Compared with transcriptomes from the three salt-treated and three salt-free samples, 231 differentially expressed genes (DEGs) were detected (including 130 up-regulated genes and 101 down-regulated genes), and 195 unigenes were functionally annotated. Based on the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) classifications of the DEGs, more attention should be paid to transcripts associated with signal transduction, transporters, the cell wall and growth, defense metabolism and transcription factors involved in salt tolerance. Conclusions This report provides a genome-wide transcriptional analysis of a halophyte, Suaeda glauca, under salt stress. Further studies of the genetic basis of salt tolerance in halophytes are warranted. PMID:26930632

  19. Transcriptome Sequences Resolve Deep Relationships of the Grape Family

    PubMed Central

    Wen, Jun; Xiong, Zhiqiang; Nie, Ze-Long; Mao, Likai; Zhu, Yabing; Kan, Xian-Zhao; Ickert-Bond, Stefanie M.; Gerrath, Jean; Zimmer, Elizabeth A.; Fang, Xiao-Dong

    2013-01-01

    Previous phylogenetic studies of the grape family (Vitaceae) yielded poorly resolved deep relationships, thus impeding our understanding of the evolution of the family. Next-generation sequencing now offers access to protein coding sequences very easily, quickly and cost-effectively. To improve upon earlier work, we extracted 417 orthologous single-copy nuclear genes from the transcriptomes of 15 species of the Vitaceae, covering its phylogenetic diversity. The resulting transcriptome phylogeny provides robust support for the deep relationships, showing the phylogenetic utility of transcriptome data for plants over a time scale at least since the mid-Cretaceous. The pros and cons of transcriptome data for phylogenetic inference in plants are also evaluated. PMID:24069307

  20. Comparative genomics and transcriptomics of Escherichia coli isolates carrying virulence factors of both enteropathogenic and enterotoxigenic E. coli.

    PubMed

    Hazen, Tracy H; Michalski, Jane; Luo, Qingwei; Shetty, Amol C; Daugherty, Sean C; Fleckenstein, James M; Rasko, David A

    2017-06-14

    Escherichia coli that are capable of causing human disease are often classified into pathogenic variants (pathovars) based on their virulence gene content. However, disease-associated hybrid E. coli, containing unique combinations of multiple canonical virulence factors have also been described. Such was the case of the E. coli O104:H4 outbreak in 2011, which caused significant morbidity and mortality. Among the pathovars of diarrheagenic E. coli that cause significant human disease are the enteropathogenic E. coli (EPEC) and enterotoxigenic E. coli (ETEC). In the current study we use comparative genomics, transcriptomics, and functional studies to characterize isolates that contain virulence factors of both EPEC and ETEC. Based on phylogenomic analysis, these hybrid isolates are more genomically-related to EPEC, but appear to have acquired ETEC virulence genes. Global transcriptional analysis using RNA sequencing, demonstrated that the EPEC and ETEC virulence genes of these hybrid isolates were differentially-expressed under virulence-inducing laboratory conditions, similar to reference isolates. Immunoblot assays further verified that the virulence gene products were produced and that the T3SS effector EspB of EPEC, and heat-labile toxin of ETEC were secreted. These findings document the existence and virulence potential of an E. coli pathovar hybrid that blurs the distinction between E. coli pathovars.

  1. Transcriptome and proteomic analysis of mango (Mangifera indica Linn) fruits.

    PubMed

    Wu, Hong-xia; Jia, Hui-min; Ma, Xiao-wei; Wang, Song-biao; Yao, Quan-sheng; Xu, Wen-tian; Zhou, Yi-gang; Gao, Zhong-shan; Zhan, Ru-lin

    2014-06-13

    Here we used Illumina RNA-seq technology for transcriptome sequencing of a mixed fruit sample from 'Zill' mango (Mangifera indica Linn) fruit pericarp and pulp during the development and ripening stages. RNA-seq generated 68,419,722 sequence reads that were assembled into 54,207 transcripts with a mean length of 858bp, including 26,413 clusters and 27,794 singletons. A total of 42,515(78.43%) transcripts were annotated using public protein databases, with a cut-off E-value above 10(-5), of which 35,198 and 14,619 transcripts were assigned to gene ontology terms and clusters of orthologous groups respectively. Functional annotation against the Kyoto Encyclopedia of Genes and Genomes database identified 23,741(43.79%) transcripts which were mapped to 128 pathways. These pathways revealed many previously unknown transcripts. We also applied mass spectrometry-based transcriptome data to characterize the proteome of ripe fruit. LC-MS/MS analysis of the mango fruit proteome was using tandem mass spectrometry (MS/MS) in an LTQ Orbitrap Velos (Thermo) coupled online to the HPLC. This approach enabled the identification of 7536 peptides that matched 2754 proteins. Our study provides a comprehensive sequence for a systemic view of transcriptome during mango fruit development and the most comprehensive fruit proteome to date, which are useful for further genomics research and proteomic studies. Our study provides a comprehensive sequence for a systemic view of both the transcriptome and proteome of mango fruit, and a valuable reference for further research on gene expression and protein identification. This article is part of a Special Issue entitled: Proteomics of non-model organisms. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez, Diego; Challacombe, Jean; Morgenstern, Ingo

    2009-02-04

    Brown-rot fungi such as Postia placenta are common inhabitants of forest ecosystems and are also largely responsible for the destructive decay of wooden structures. Rapid depolymerization of cellulose is a distinguishing feature of brown-rot, but the biochemical mechanisms and underlying genetics are poorly understood. Systematic examination of the P. placenta genome, transcriptome, and secretome revealed unique extracellular enzyme systems, including an unusual repertoire of extracellular glycoside hydrolases. Genes encoding exocellobiohydrolases and cellulose-binding domains, typical of cellulolytic microbes, are absent in this efficient cellulose-degrading fungus. When P. placenta was grown in media containing cellulose as sole carbon source, transcripts corresponding tomore » many hemicellulases and to a single putative β-1-4 endoglucanase were expressed at high levels relative to glucose grown cultures. These transcript profiles were confirmed by direct identification of peptides by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Also upregulated under cellulolytic culture conditions were putative iron reductases, quinone reductase, and structurally divergent oxidases potentially involved in extracellular generation of Fe(II) and H2O2. These observations are consistent with a biodegradative role for Fenton chemistry in which Fe(II) and H2O2 react to form hydroxyl radicals, highly reactive oxidants capable of depolymerizing cellulose. The P. placenta genome resources provide unparalleled opportunities for investigating such unusual mechanisms of cellulose conversion. More broadly, the genome offers insight into the diversification of lignocellulose degrading mechanisms in fungi. In particular, comparisons between P. placenta and the closely related white-rot fungus, Phanerochaete chrysosporium support an evolutionary shift from white-rot to brown-rot during which efficient depolymerization of lignin was lost.« less

  3. Genome, transcriptome, and secretome analysis of wood decay fungus postia placenta supports unique mechanisms of lignocellulose conversion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Martinez, Diego; Challacombe, Jean F; Misra, Monica

    2008-01-01

    Brown-rot fungi such as Postia placenta are common inhabitants of forest ecosystems and are also largely responsible for the destructive decay of wooden structures. Rapid depolymerization of cellulose is a distinguishing feature of brown-rot, but the biochemical mechanisms and underlying genetics are poorly understood. Systematic examination of the P. placenta genome, transcriptome and secretome revealed unique extracellular enzyme systems, including an unusual repertoire of extracellular glycoside hydrolases. Genes encoding exocellobiohydrolases and cellulose-binding domains, typical of cellulolytic microbes, are absent in this efficient cellulose-degrading fungus. When P. placenta was grown in medium containing cellulose as sole carbon source, transcripts corresponding tomore » many hemicellulases and to a single putative {beta}-1-4 endoglucanase were expressed at high levels relative to glucose grown cultures. These transcript profiles were confirmed by direct identification of peptides by liquid chromatography-tandem mass spectrometry (LC{center_dot}MSIMS). Also upregulated during growth on cellulose medium were putative iron reductases, quinone reductase, and structurally divergent oxidases potentially involved in extracellular generation of Fe(II) and H202. These observations are consistent with a biodegradative role for Fenton chemistry in which Fe(II) and H202 react to form hydroxyl radicals, highly reactive oxidants capable of depolymerizing cellulose. The P. placenta genome resources provide unparalleled opportunities for investigating such unusual mechanisms of cellulose conversion. More broadly, the genome offers insight into the diversification of lignocellulose degrading mechanisms in fungi. Comparisons to the closely related white-rot fungus Phanerochaete chrysosporium support an evolutionary shift from white-rot to brown-rot during which the capacity for efficient depolymerization of lignin was lost.« less

  4. Genome, transcriptome, and secretome analysis of wood decay fungus Postia placenta supports unique mechanisms of lignocellulose conversion

    PubMed Central

    Martinez, Diego; Challacombe, Jean; Morgenstern, Ingo; Hibbett, David; Schmoll, Monika; Kubicek, Christian P.; Ferreira, Patricia; Ruiz-Duenas, Francisco J.; Martinez, Angel T.; Kersten, Phil; Hammel, Kenneth E.; Vanden Wymelenberg, Amber; Gaskell, Jill; Lindquist, Erika; Sabat, Grzegorz; Splinter BonDurant, Sandra; Larrondo, Luis F.; Canessa, Paulo; Vicuna, Rafael; Yadav, Jagjit; Doddapaneni, Harshavardhan; Subramanian, Venkataramanan; Pisabarro, Antonio G.; Lavín, José L.; Oguiza, José A.; Master, Emma; Henrissat, Bernard; Coutinho, Pedro M.; Harris, Paul; Magnuson, Jon Karl; Baker, Scott E.; Bruno, Kenneth; Kenealy, William; Hoegger, Patrik J.; Kües, Ursula; Ramaiya, Preethi; Lucas, Susan; Salamov, Asaf; Shapiro, Harris; Tu, Hank; Chee, Christine L.; Misra, Monica; Xie, Gary; Teter, Sarah; Yaver, Debbie; James, Tim; Mokrejs, Martin; Pospisek, Martin; Grigoriev, Igor V.; Brettin, Thomas; Rokhsar, Dan; Berka, Randy; Cullen, Dan

    2009-01-01

    Brown-rot fungi such as Postia placenta are common inhabitants of forest ecosystems and are also largely responsible for the destructive decay of wooden structures. Rapid depolymerization of cellulose is a distinguishing feature of brown-rot, but the biochemical mechanisms and underlying genetics are poorly understood. Systematic examination of the P. placenta genome, transcriptome, and secretome revealed unique extracellular enzyme systems, including an unusual repertoire of extracellular glycoside hydrolases. Genes encoding exocellobiohydrolases and cellulose-binding domains, typical of cellulolytic microbes, are absent in this efficient cellulose-degrading fungus. When P. placenta was grown in medium containing cellulose as sole carbon source, transcripts corresponding to many hemicellulases and to a single putative β-1–4 endoglucanase were expressed at high levels relative to glucose-grown cultures. These transcript profiles were confirmed by direct identification of peptides by liquid chromatography-tandem mass spectrometry (LC-MS/MS). Also up-regulated during growth on cellulose medium were putative iron reductases, quinone reductase, and structurally divergent oxidases potentially involved in extracellular generation of Fe(II) and H2O2. These observations are consistent with a biodegradative role for Fenton chemistry in which Fe(II) and H2O2 react to form hydroxyl radicals, highly reactive oxidants capable of depolymerizing cellulose. The P. placenta genome resources provide unparalleled opportunities for investigating such unusual mechanisms of cellulose conversion. More broadly, the genome offers insight into the diversification of lignocellulose degrading mechanisms in fungi. Comparisons with the closely related white-rot fungus Phanerochaete chrysosporium support an evolutionary shift from white-rot to brown-rot during which the capacity for efficient depolymerization of lignin was lost. PMID:19193860

  5. Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

    PubMed

    Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

    2012-01-01

    Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to

  6. De novo assembly and transcriptome characterization of the freshwater prawn Palaemonetes argentinus: Implications for a detoxification response.

    PubMed

    García, C Fernando; Pedrini, Nicolas; Sánchez-Paz, Arturo; Reyna-Blanco, Carlos S; Lavarias, Sabrina; Muhlia-Almazán, Adriana; Fernández-Giménez, Analía; Laino, Aldana; de-la-Re-Vega, Enrique; Lukaszewicz, German; López-Zavala, Alonso A; Brieba, Luis G; Criscitello, Michael F; Carrasco-Miranda, Jesús S; García-Orozco, Karina D; Ochoa-Leyva, Adrian; Rudiño-Piñera, Enrique; Sanchez-Flores, Alejandro; Sotelo-Mundo, Rogerio R

    2018-02-01

    Palaemonetes argentinus, an abundant freshwater prawn species in the northern and central region of Argentina, has been used as a bioindicator of environmental pollutants as it displays a very high sensitivity to pollutants exposure. Despite their extraordinary ecological relevance, a lack of genomic information has hindered a more thorough understanding of the molecular mechanisms potentially involved in detoxification processes of this species. Thus, transcriptomic profiling studies represent a promising approach to overcome the limitations imposed by the lack of extensive genomic resources for P. argentinus, and may improve the understanding of its physiological and molecular response triggered by pollutants. This work represents the first comprehensive transcriptome-based characterization of the non-model species P. argentinus to generate functional genomic annotations and provides valuable resources for future genetic studies. Trinity de novo assembly consisted of 24,738 transcripts with high representation of detoxification (phase I and II), anti-oxidation, osmoregulation pathways and DNA replication and bioenergetics. This crustacean transcriptome provides valuable molecular information about detoxification and biochemical processes that could be applied as biomarkers in further ecotoxicology studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Intra-tumor heterogeneity in breast cancer has limited impact on transcriptomic-based molecular profiling.

    PubMed

    Karthik, Govindasamy-Muralidharan; Rantalainen, Mattias; Stålhammar, Gustav; Lövrot, John; Ullah, Ikram; Alkodsi, Amjad; Ma, Ran; Wedlund, Lena; Lindberg, Johan; Frisell, Jan; Bergh, Jonas; Hartman, Johan

    2017-11-29

    Transcriptomic profiling of breast tumors provides opportunity for subtyping and molecular-based patient stratification. In diagnostic applications the specimen profiled should be representative of the expression profile of the whole tumor and ideally capture properties of the most aggressive part of the tumor. However, breast cancers commonly exhibit intra-tumor heterogeneity at molecular, genomic and in phenotypic level, which can arise during tumor evolution. Currently it is not established to what extent a random sampling approach may influence molecular breast cancer diagnostics. In this study we applied RNA-sequencing to quantify gene expression in 43 pieces (2-5 pieces per tumor) from 12 breast tumors (Cohort 1). We determined molecular subtype and transcriptomic grade for all tumor pieces and analysed to what extent pieces originating from the same tumors are concordant or discordant with each other. Additionally, we validated our finding in an independent cohort consisting of 19 pieces (2-6 pieces per tumor) from 6 breast tumors (Cohort 2) profiled using microarray technique. Exome sequencing was also performed on this cohort, to investigate the extent of intra-tumor genomic heterogeneity versus the intra-tumor molecular subtype classifications. Molecular subtyping was consistent in 11 out of 12 tumors and transcriptomic grade assignments were consistent in 11 out of 12 tumors as well. Molecular subtype predictions revealed consistent subtypes in four out of six patients in this cohort 2. Interestingly, we observed extensive intra-tumor genomic heterogeneity in these tumor pieces but not in their molecular subtype classifications. Our results suggest that macroscopic intra-tumoral transcriptomic heterogeneity is limited and unlikely to have an impact on molecular diagnostics for most patients.

  8. Genomic Organization, Transcriptomic Analysis, and Functional Characterization of Avian α- and β-Keratins in Diverse Feather Forms

    PubMed Central

    Fan, Wen-Lang; Yan, Jie; Chen, Chih-Kuan; Lai, Yu-Ting; Wu, Siao-Man; Mao, Chi-Tang; Chen, Jun-Jie; Lu, Mei-Yeh Jade; Ho, Meng-Ru; Widelitz, Randall B.; Chen, Chih-Feng; Chuong, Cheng-Ming; Li, Wen-Hsiung

    2014-01-01

    Feathers are hallmark avian integument appendages, although they were also present on theropods. They are composed of flexible corneous materials made of α- and β-keratins, but their genomic organization and their functional roles in feathers have not been well studied. First, we made an exhaustive search of α- and β-keratin genes in the new chicken genome assembly (Galgal4). Then, using transcriptomic analysis, we studied α- and β-keratin gene expression patterns in five types of feather epidermis. The expression patterns of β-keratin genes were different in different feather types, whereas those of α-keratin genes were less variable. In addition, we obtained extensive α- and β-keratin mRNA in situ hybridization data, showing that α-keratins and β-keratins are preferentially expressed in different parts of the feather components. Together, our data suggest that feather morphological and structural diversity can largely be attributed to differential combinations of α- and β-keratin genes in different intrafeather regions and/or feather types from different body parts. The expression profiles provide new insights into the evolutionary origin and diversification of feathers. Finally, functional analysis using mutant chicken keratin forms based on those found in the human α-keratin mutation database led to abnormal phenotypes. This demonstrates that the chicken can be a convenient model for studying the molecular biology of human keratin-based diseases. PMID:25152353

  9. The Draft Genome and Transcriptome of Panagrellus redivivus Are Shaped by the Harsh Demands of a Free-Living Lifestyle

    PubMed Central

    Srinivasan, Jagan; Dillman, Adler R.; Macchietto, Marissa G.; Heikkinen, Liisa; Lakso, Merja; Fracchia, Kelley M.; Antoshechkin, Igor; Mortazavi, Ali; Wong, Garry; Sternberg, Paul W.

    2013-01-01

    Nematodes compose an abundant and diverse invertebrate phylum with members inhabiting nearly every ecological niche. Panagrellus redivivus (the “microworm”) is a free-living nematode frequently used to understand the evolution of developmental and behavioral processes given its phylogenetic distance to Caenorhabditis elegans. Here we report the de novo sequencing of the genome, transcriptome, and small RNAs of P. redivivus. Using a combination of automated gene finders and RNA-seq data, we predict 24,249 genes and 32,676 transcripts. Small RNA analysis revealed 248 microRNA (miRNA) hairpins, of which 63 had orthologs in other species. Fourteen miRNA clusters containing 42 miRNA precursors were found. The RNA interference, dauer development, and programmed cell death pathways are largely conserved. Analysis of protein family domain abundance revealed that P. redivivus has experienced a striking expansion of BTB domain-containing proteins and an unprecedented expansion of the cullin scaffold family of proteins involved in multi-subunit ubiquitin ligases, suggesting proteolytic plasticity and/or tighter regulation of protein turnover. The eukaryotic release factor protein family has also been dramatically expanded and suggests an ongoing evolutionary arms race with viruses and transposons. The P. redivivus genome provides a resource to advance our understanding of nematode evolution and biology and to further elucidate the genomic architecture leading to free-living lineages, taking advantage of the many fascinating features of this worm revealed by comparative studies. PMID:23410827

  10. Sex and parasites: genomic and transcriptomic analysis of Microbotryum lychnidis-dioicae, the biotrophic and plant-castrating anther smut fungus.

    PubMed

    Perlin, Michael H; Amselem, Joelle; Fontanillas, Eric; Toh, Su San; Chen, Zehua; Goldberg, Jonathan; Duplessis, Sebastien; Henrissat, Bernard; Young, Sarah; Zeng, Qiandong; Aguileta, Gabriela; Petit, Elsa; Badouin, Helene; Andrews, Jared; Razeeq, Dominique; Gabaldón, Toni; Quesneville, Hadi; Giraud, Tatiana; Hood, Michael E; Schultz, David J; Cuomo, Christina A

    2015-06-16

    The genus Microbotryum includes plant pathogenic fungi afflicting a wide variety of hosts with anther smut disease. Microbotryum lychnidis-dioicae infects Silene latifolia and replaces host pollen with fungal spores, exhibiting biotrophy and necrosis associated with altering plant development. We determined the haploid genome sequence for M. lychnidis-dioicae and analyzed whole transcriptome data from plant infections and other stages of the fungal lifecycle, revealing the inventory and expression level of genes that facilitate pathogenic growth. Compared to related fungi, an expanded number of major facilitator superfamily transporters and secretory lipases were detected; lipase gene expression was found to be altered by exposure to lipid compounds, which signaled a switch to dikaryotic, pathogenic growth. In addition, while enzymes to digest cellulose, xylan, xyloglucan, and highly substituted forms of pectin were absent, along with depletion of peroxidases and superoxide dismutases that protect the fungus from oxidative stress, the repertoire of glycosyltransferases and of enzymes that could manipulate host development has expanded. A total of 14% of the genome was categorized as repetitive sequences. Transposable elements have accumulated in mating-type chromosomal regions and were also associated across the genome with gene clusters of small secreted proteins, which may mediate host interactions. The unique absence of enzyme classes for plant cell wall degradation and maintenance of enzymes that break down components of pollen tubes and flowers provides a striking example of biotrophic host adaptation.

  11. DNA Data Bank of Japan (DDBJ) for genome scale research in life science

    PubMed Central

    Tateno, Y.; Imanishi, T.; Miyazaki, S.; Fukami-Kobayashi, K.; Saitou, N.; Sugawara, H.; Gojobori, T.

    2002-01-01

    The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools. PMID:11752245

  12. Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.

    PubMed

    Martínez-Barnetche, Jesús; Gómez-Barreto, Rosa E; Ovilla-Muñoz, Marbella; Téllez-Sosa, Juan; García López, David E; Dinglasan, Rhoel R; Ubaida Mohien, Ceereena; MacCallum, Robert M; Redmond, Seth N; Gibbons, John G; Rokas, Antonis; Machado, Carlos A; Cazares-Raga, Febe E; González-Cerón, Lilia; Hernández-Martínez, Salvador; Rodríguez López, Mario H

    2012-05-30

    Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may

  13. Genome-wide transcriptome analysis of soybean primary root under varying water-deficit conditions.

    PubMed

    Song, Li; Prince, Silvas; Valliyodan, Babu; Joshi, Trupti; Maldonado dos Santos, Joao V; Wang, Jiaojiao; Lin, Li; Wan, Jinrong; Wang, Yongqin; Xu, Dong; Nguyen, Henry T

    2016-01-15

    Soybean is a major crop that provides an important source of protein and oil to humans and animals, but its production can be dramatically decreased by the occurrence of drought stress. Soybeans can survive drought stress if there is a robust and deep root system at the early vegetative growth stage. However, little is known about the genome-wide molecular mechanisms contributing to soybean root system architecture. This study was performed to gain knowledge on transcriptome changes and related molecular mechanisms contributing to soybean root development under water limited conditions. The soybean Williams 82 genotype was subjected to very mild stress (VMS), mild stress (MS) and severe stress (SS) conditions, as well as recovery from the severe stress after re-watering (SR). In total, 6,609 genes in the roots showed differential expression patterns in response to different water-deficit stress levels. Genes involved in hormone (Auxin/Ethylene), carbohydrate, and cell wall-related metabolism (XTH/lipid/flavonoids/lignin) pathways were differentially regulated in the soybean root system. Several transcription factors (TFs) regulating root growth and responses under varying water-deficit conditions were identified and the expression patterns of six TFs were found to be common across the stress levels. Further analysis on the whole plant level led to the finding of tissue-specific or water-deficit levels specific regulation of transcription factors. Analysis of the over-represented motif of different gene groups revealed several new cis-elements associated with different levels of water deficit. The expression patterns of 18 genes were confirmed byquantitative reverse transcription polymerase chain reaction method and demonstrated the accuracy and effectiveness of RNA-Seq. The primary root specific transcriptome in soybean can enable a better understanding of the root response to water deficit conditions. The genes detected in root tissues that were associated with

  14. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  15. Comparative functional genomics of salt stress in related model and cultivated plants identifies and overcomes limitations to translational genomics.

    PubMed

    Sanchez, Diego H; Pieckenstain, Fernando L; Szymanski, Jedrzey; Erban, Alexander; Bromke, Mariusz; Hannah, Matthew A; Kraemer, Ute; Kopka, Joachim; Udvardi, Michael K

    2011-02-14

    One of the objectives of plant translational genomics is to use knowledge and genes discovered in model species to improve crops. However, the value of translational genomics to plant breeding, especially for complex traits like abiotic stress tolerance, remains uncertain. Using comparative genomics (ionomics, transcriptomics and metabolomics) we analyzed the responses to salinity of three model and three cultivated species of the legume genus Lotus. At physiological and ionomic levels, models responded to salinity in a similar way to crop species, and changes in the concentration of shoot Cl(-) correlated well with tolerance. Metabolic changes were partially conserved, but divergence was observed amongst the genotypes. Transcriptome analysis showed that about 60% of expressed genes were responsive to salt treatment in one or more species, but less than 1% was responsive in all. Therefore, genotype-specific transcriptional and metabolic changes overshadowed conserved responses to salinity and represent an impediment to simple translational genomics. However, 'triangulation' from multiple genotypes enabled the identification of conserved and tolerant-specific responses that may provide durable tolerance across species.

  16. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  17. Enumeration of Smallest Intervention Strategies in Genome-Scale Metabolic Networks

    PubMed Central

    von Kamp, Axel; Klamt, Steffen

    2014-01-01

    One ultimate goal of metabolic network modeling is the rational redesign of biochemical networks to optimize the production of certain compounds by cellular systems. Although several constraint-based optimization techniques have been developed for this purpose, methods for systematic enumeration of intervention strategies in genome-scale metabolic networks are still lacking. In principle, Minimal Cut Sets (MCSs; inclusion-minimal combinations of reaction or gene deletions that lead to the fulfilment of a given intervention goal) provide an exhaustive enumeration approach. However, their disadvantage is the combinatorial explosion in larger networks and the requirement to compute first the elementary modes (EMs) which itself is impractical in genome-scale networks. We present MCSEnumerator, a new method for effective enumeration of the smallest MCSs (with fewest interventions) in genome-scale metabolic network models. For this we combine two approaches, namely (i) the mapping of MCSs to EMs in a dual network, and (ii) a modified algorithm by which shortest EMs can be effectively determined in large networks. In this way, we can identify the smallest MCSs by calculating the shortest EMs in the dual network. Realistic application examples demonstrate that our algorithm is able to list thousands of the most efficient intervention strategies in genome-scale networks for various intervention problems. For instance, for the first time we could enumerate all synthetic lethals in E.coli with combinations of up to 5 reactions. We also applied the new algorithm exemplarily to compute strain designs for growth-coupled synthesis of different products (ethanol, fumarate, serine) by E.coli. We found numerous new engineering strategies partially requiring less knockouts and guaranteeing higher product yields (even without the assumption of optimal growth) than reported previously. The strength of the presented approach is that smallest intervention strategies can be quickly

  18. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

    PubMed Central

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-01-01

    Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048

  19. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

    PubMed

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-03-01

    Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.

  20. Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.

    PubMed

    Liu, Yanan; Wang, Baoju; Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia

    2016-01-01

    The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development